Truemag

  • Subscribe
    • New Subscription
    • Account Updates
    • Customer Service
  • News & Events
    • News
    • Events
  • Advertise
    • Media Kit
    • Reprints
    • Contacts
  • Editorial
    • Podcasts
    • Current Articles
    • Digital Editions
    • eNewsletter
    • Editor’s Desk
    • Edit Calendar
    • Contacts
  • Buyers Guide
    • Search
    • Sponsor Index
    • Vendor Update
  • Annual Software Ranking
    • Ranking Form
    • Annual Software Ranking
    • 2018 Software Ranking File Package

Dataiku’s DSS Increases Scale and Improves Speed of Analytics with Apache Spark

09.29.2015

Dataiku the maker of Data Science Studio (DSS), announced today the integration with the advanced data processing engine, Apache Spark. By adopting Spark, data analysts can process much larger Hadoop data sets, ranging into the terabytes and also process that information much more quickly.

Paring the capabilities of Apache Spark with the advanced analytics features of DSS creates significant opportunities for those looking to leverage very large data sets. DSS provides an IDE (Integrated Development Environment), which gives developers the tools to rapidly build Ad Hoc queries, which are then processed against selected data sets, creating visual representations of the relationships found in the data.

Visual Recipes, which are a core component of DSS, can now be executed on the Apache Spark framework, while leveraging the SparkSQL programing language and data processing engine. That helps DSS users perform tasks such as joins and aggregations dozens if not hundreds of times faster than what could be accomplished with Hadoop using Apache Hive.

Apache Spark integration also gives DSS the ability to work with Spark R, SparkSQL, and PySpark, which brings R, SQL, and python based programing to the Spark environment. Much like the other components of Spark, PySpark and Spark R eases and speeds the native capabilities found in DSS and makes Spark a viable alternative to the traditional Hadoop/Hive stack, while also allowing analysts to share data engineering recipes and limit the need to recode or redevelop algorithms.

The integration of Apache Spark brings with it many other advantages, all of which dovetail well into the inherent capabilities of DSS. Those advantages include:

  • Data Volume: Spark enables data analysts to use DSS to deploy advanced algorithms across several hundred gigabytes of data.
  • Collaboration: The PySpark and Spark R frameworks makes it easier for team members to share cluster resources.
  • Education: DSS offers a unified interface for multiple frameworks, allowing users to immediately delve into the capabilities of Apache Spark, without having to learn the intricacies of a new of technological frameworks and dialects / languages.
  • Future Proof: Thousands of contributors are continually working to enhance the Spark Project, creating new standards and enhancements, which are rolled into the DSS/Spark environment.

 

Another important element that DSS brings to the table with Apache Spark is the ability to train models using both MLlib and Scikit-Learn. By adding MLlib to the mix, users are now able to address large scale projects by being able to model the data sources in their entirety. That in turn allows analysts to leverage the full cluster of data services and avoid the problems normally associated with a “divide and conquer” approach that may miss important segments of data.

The addition of Apache Spark to the extensive number of datastores already supported by DSS, allows analysts to create large scale big data analytics projects, without the risk of reaching beyond the capabilities offered by data engines currently in use.

dataiku.com

Sep 29, 2005Cassie Balentine
Covertix Supports Secure Transition to Microsoft Office 365Artificial Solutions Launches Fully Integrated Enterprise Platform
Product Centrics
TrueNAS Open Source Storage Platform brings Full Windows ACL Support to Linux

Fully featured Windows file system ACLs are well supported in TrueNAS 12.0 (CORE and Enterprise), but not generally supported by Linux. Thanks to some innovation, and sweat from the iXsystems engineering team, TrueNAS SCALE 21.08...

Driving Successful Digital Transformation Initiatives in 2022

Well, the end of the year is the perfect time to reflect on all the past year's activities and plan for the coming year. As we plan for 2022, one thing...

Recovery Platforms

Established in 2013, Imanis Data, previously Talena...

Data Driven Efficiency

Founded in 2003, Tableau is a public software company...

Updated Hitachi CRM

Building Product Manufacturers (BPM) require...

Quick Links
Untitled Document
SW500 SW500 SW500 SW500 SW500
2022 © Rockport Custom Publishing, LLC