Dataiku DSS – Data Science Studio

Dataiku DSS

Dataiku DSS (Data Science Studio) is a collaborative data science platform designed to help scientists, analysts, and engineers explore, prototype, build, and deliver their own data products with maximum efficiency. DSS can run locally, within a database or in a distributed environment. It also integrates with Python and R. DSS allows the user to push computations to different engines (e.g., R, Python, Database, Hadoop/Spark).

One of DSS’s main strengths lies in its collaboration features. The software offers the possibility to work jointly on modeling workflows, to add documentation to a workflow and to document changes via an integrated Git repository. In addition, it has useful data preparation functions that automate the process of feature engineering. Model deployment in DSS can be done using the automation node or the real time API. Recent developments have enhanced DSS’s reporting and visualization capabilities, user management and automated machine learning capabilities.

The multi-deployment software has an all-in-one analytics and data science system that includes integrated coding and visual interface. This allows the use of notebooks such as R, Python, Hive, Spark, and more. A customizable drag and drop visual interface may also be used at any part of the predictive dataflow prototyping process – from wrangling, analysis, and modeling.

Data agnostic integration with over 30 data connectors and custom plugins extensions connect users to existing infrastructure. Dataiku DSS also detects data format and schema, and allows push computation to existing SQL, Hadoop, or Spark Infrastructure.

A visual profile of organizational data at each portion of an analysis allows users to interactively explore, prepare, enrich, blend, and clean data. The technology also leverages on Machine Learning such as Scikit-Learn, MLlib, XGboost, and more.

Dataiku DSS has a visual UI and tools to create and upgrade models in Python or R, for instant visual and statistical feedback on model performance. Production Deployment solutions bundle an entire workflow, while Monitoring and Version Control ensures that deployments run with proper data validation policies.

Dataiku is an innovative software for those who develop using open source languages and need to add platform capabilities such as deployment, collaboration and user management. It offers interactive dashboards with a variety of charts, including a map engine for spatial data, as well as support for Web Apps, which makes it possible to customize and display any kind of visualization. The software is available on-premises and for cloud deployment and is accessed via a web browser

Collaboration

Integrated documentation and knowledge sharing: Dataiku is designed from the ground-up for data teams. Collaboration features make it easy to share knowledge amongst team members and onboard new users much faster.You can add detailed descriptions on your Dataiku objects (datasets, code, models…), tag, comment and favorite any Dataiku objects. Dataiku DSS also allows to engage with other users of the platform through Discussions and create Wikis to document your projects

Change management: Every action in the system is versioned and logged through an integrated Git repository. You can follow each action in the timeline in the interface and also easily rollback to previous versions.

Team activity monitoring: Dedicated dashboards help project managers keep an eye on their team’s activity. There are active and inactive projects and you can monitor team’s progress and commits.

Model Deployment

Easy to deploy: Software empowers analysts and data scientists to deploy models into production in a few clicks.Data cleaning, enriching, preprocessing, as well as models, are bundled together for simplified scoring pipelines.Deployed models are versioned, enabling users to deploy new versions, compare them and rollback at anytime.

Scalability & high availability:  You can handle large quantities of real-time predictions with queuing, parallelism, and load balancing and run multiple scoring nodes for full high availability. There is also automatic elastic scaling to handle unexpected traffic surges.

Deployment with Kubernetes:  You can deploy your API on-premises or in the cloud with fully native integration of Kubernetes for elastic and reproducible deployments. Full GPU support for deep-learning models is also available.

Powerful API engine: Deploy as an API visual models, custom Python or R models, custom Python or R functions or SQL queries. REST API is easy to use. There is also automatic generation of ready-to-use code samples.

Feedback loop: You can run multiple versions of the same model at the same time for automated A/B testing and monitor data changes over time. Access history of logs queries and predictions at any time to check that model performance is not drifting with time.

Visual Machine Learning and Modeling

Automated machine learning: Automatic feature engineering, generation, and selection to use any kind of data in your models. Consuct optimization of your model hyperparameters using various cross validation strategies and compare dozens of algorithms from Dataiku interface, both for supervised and unsupervised tasks. There is also available instant visual insights from your model (variables importance, features interactions or parameters), and assess model’s performance through detailed metrics.

Deep learning: With Dataiku DSS you can define your model architecture and personalize training settings. There is support for Keras, Tensorflow backend, integrate with Tensorboard, and scale model training using GPU. It also allows to automatically handle images, including features extractios and use pretrained models right from Dataiku interface.

Machine learning in production: As soon as your model is built and assessed, use it for batch scoring within your data workflow and deploy it as a real-time prediction service (REST API). Dataiku DSS give use all instrument to manage easily your model lifecycle: deploy new versions, retrain previous versions and rollback to any secure version in just one click and also to control your model’s performance over time with a feedback loop.

Dataiku DSS Benefits

With Dataiku DSS, converting raw data into useable, real-time predictions requires only one interface from start to finish. Users can explore, wrangle, and prepare without worrying about issues regarding format, storage, accesibility, and the like. The platform boasts of more than 25 connectors, with the option to create your own. Users can access all kinds of data anytime – from big or small, structured or unstructured, internal or external.

Dataiku’s Quick Columns View allows users to see the quality of data in real-time – from duplicates and invalids, to completeness and accuracy, to distribution and outliers. Full statistical summaries are also provided with just one click.

Dataiku DSS has data preparation tools for fast and reliable advanced analytics and predictive modeling.  A spreadsheet type interface, plus automatically suggested contextual transformations, makes performing mass actions on data less tedious for everyone.

Retrieval is not a problem as Dataiku DSS as more than 90 built-in visual processors helps filter, search, and run code-free wrangling. Users may also customize solutions with business-specific and tailored transformation types. Formulas and Python scripts also allows custom processing.

Dataiku DSS Dashboards make it easy to create interactive charts and visualizations from datasets. 25 built-in chart formats – such ashistograms, to boxplots, to maps – allows drag-and-drop of data to automatically compute charts on existing Big Data infrastructure (SQL or Impala), thus resulting to optimal performance.

About Dataiku

Dataiku is a computer software company headquartered in New York City. The company develops collaborative data science software marketed for big data.The company was founded in Paris in 2013 by 4 co-founders. Two of them met while working at French search engine company Exalead, including chief executive Florian Douetteau, and Clément Sténac.

Dataiku opened an office in New York City in 2015 which became the company headquarters.They opened an office in London in the summer of 2016, and announced an office in Sydney in February 2019. The software Dataiku Data Science Studio (DSS) was announced in 2014, supporting predictive modelling to build business applications.Later versions of DSS added other features. Dataiku offers a free edition and enterprise versions   with additional features, such as multi-user collaboration or real-time scoring.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *