MapR Simplifies End-to-End Workflow for Data Scientists

2.8.18

MapR Technologies, Inc., a pioneer in delivering one platform for all data, across every cloud, today announced the immediate availability of the MapR Expansion Pack (MEP) 4.1, enabling data scientists and engineers to create scalable deep learning pipelines, make operational data instantly available for data science, and achieve over 2X performance improvements across a variety of data discovery and ad-hoc queries. MEP 4.1 expands the ability to build real-time pipelines and brings data science capabilities to a broad set of users with new languages support.

“We are focused on enabling a variety of users to be productive on MapR on both development and deployment, allowing them to leverage all data effortlessly regardless of the data’s location on prem, in the cloud or at the edge,” said Neeraja Rentachintala, senior director, database and analytics, MapR. “The latest release of the MapR Expansion Pack gives the data scientist community an option to use their language of choice, Python, for high-performance data analysis on operational data, defining real-time workflows to read/write data into MapR database, and pain free deployment by simplifying the Python library distribution across the cluster.”

Complementary to the MapR Converged Data Platform, new features added to MapR-DB, MapR Data Science Refinery, and Apache Drill 1.12 in the MapR Expansion Pack 4.1, include:
• MapR Data Science Refinery extends support for distributing Python archives for PySpark. This allows data scientists to leverage popular Python data science libraries in a distributed way to create scalable deep learning pipelines.

• MapR Data Science Refinery enables Apache Zeppelin to easily leverage a diverse set of Python libraries and environments that can be shared and stored in MapR-XD.

• PySpark jobs can directly read and write to MapR-DB OJAI, making operational data instantly available for data science.

• Python and Java Bindings for MapR-DB OJAI Connector for Apache Spark enable developers to read/write to MapR-DB from Spark using Java and Python. With this, developers can now build data-intensive business applications in Java and Python.

• A new version of Apache Drill, Drill 1.12 enables fast data exploration on operational data in MapR-DB and historical data in Parquet for data scientists, with over 2X performance improvements across a variety of data discovery and ad-hoc queries.

MapR-DB is a high-performance NoSQL (“Not Only SQL”) database management system built into the MapR Converged Data Platform. It is a highly scalable multi-model database that brings together operations and analytics, and real-time streaming and database workloads to enable a broader set of next-generation data-intensive applications in organizations.

The MapR Data Science Refinery provides complete, self-service access for DataOps teams to all data from within the same cluster. Data scientists are a driving force behind the DataOps movement where data analysis is increasingly powered by machine learning / artificial intelligence to gain quick, accurate, and actionable insights.

Apache Drill is an open source distributed SQL query engine integrated into the MapR Converged Data Platform, which offers fast and secure self-service BI SQL analytics at scale. With the ability to discover schemas on-the-fly, Drill’s distributed shared-nothing architecture enables incremental scale-out with low-cost hardware to meet increasing demands of query response and user concurrency.

Availability
New features for MapR-DB, MapR Data Science Refinery, and Apache Drill 1.12 are available now in the MapR Expansion Pack 4.1.

www.mapr.com

Feb 8, 2008Olivia Cahoon

MapR Simplifies End-to-End Workflow for Data Scientists

Product Centrics

Quick Links