DataRobot For Automated Machine Learning
DataRobot automates the entire modeling lifecycle, enabling users to quickly and easily build highly accurate predictive models. DataRobot enables users to build and deploy highly accurate machine learning models in a fraction of the time it takes using traditional data science methods.
The DataRobot platform automatically searches through millions of combinations of algorithms, data preprocessing steps, transformations, features, and tuning parameters for the best machine learning model for the data.
The intuitive web-based interface allows anyone to interact with a very powerful platform, regardless of skill-level and machine learning experience. Users can drag-and-drop then let DataRobot do all the work or they can write their own models for evaluation by the platform.
DataRobot features a massively parallel modelling engine that can scale to hundreds or even thousands of powerful servers to explore, build and tune machine learning models.
Automated machine learning process
Machine Learning model can be deployed in 5 steps:
- Uploading the dataset to DataRobot platform, which accepts input from a file, a remote URL, a JDBC data source or HDFS.
- Platform infers the schema by suggesting appropriate data types for each feature. Business analysts and data scientists can perform necessary to advanced data exploratory activities on the ingested dataset. Finally, they need to select the target label which is going be predicted by the model.
- After pressing Start, DataRobot creates a long queue of algorithms and trains each of them with the uploaded dataset.
- Users can explore each model to understand the methodology and the parameters used in the training process. They can test each model with a test dataset to measure the accuracy.
- Once a model has been chosen from the leaderboard, it is deployed as an API. The endpoint becomes ready to deal with production data. Any developer with the API key can invoke this like any other RESTful service.
DataRobot Features
Innovative open source algorithms. DataRobot uses the latest and most powerful open source machine learning libraries, including scikit-learn, H2O, TensorFlow, Vowpal Wabbit, Spark ML and XGBoost.
Automated feature engineering. DataRobot prepares data automatically, performing operations like one-hot encoding, missing value imputation, text mining, and standardization to transform features for optimal results.
Time-aware forecasting. DataRobot can automate the development of sophisticated time series models that predict the future values of a data series based on its history and trend. The platform automatically detects stationarity and seasonality, and implements backtesting to achieve the highest possible accuracy.
Multiclass model support. DataRobot allows for classification on targets with up to 100 distinct values, offering real-time and batch support for uncovering the predictive class and showing its probability across all classes.
Built-in guardrails. With DataRobot, modeling projects follow a consistent methodology based on data science best practices. Novice users can’t “forget” to perform a critical step, such as model validation.
Advanced machine learning techniques. DataRobot incorporates the techniques advanced data scientists use: boosting, bagging, random forests, kernel-based methods, generalized linear models, deep learning, and many others.
Unsupervised anomaly detection. Anomalies can be uncovered in a dataset with DataRobot’s unsupervised ensemble blend model, which can offer new insights, even in familiar datasets.
Manual tuning capabilities. DataRobot automates model tuning, but also supports manual tuning so you can tune and adjust machine learning algorithms for even better results.
Visual AI
Visual AI provides the ability to include images in your Supervised Machine Learning pipeline. Similar to other DataRobot projects, Visual AI projects deliver both deployable models and associated model insights.
Visual AI has a wide variety of highly efficient, state-of-the-art, deep learning featurizers including Squeezenet, resnet50, xception, efficient-net, and some others. Each of these featurizers convert an input image to a vector of numbers
Visual AI has extra tools that are specific to the image data type which were created to enhance model insights.
- Image Activation Maps allow you to see sample locations in the image that the model is using to make decisions.
- Image Embeddings allow you to visualize a sample of images projected from their original N-dimensional feature space to a new 2-dimensional feature space. This makes it easy to see what images are considered similar.
Enterprise application
Deployment of DataRobot on-premise on standalone servers, an existing Hadoop infrastructure, or in a Virtual Private Cloud (VPC). It is also available as a managed SaaS offering hosted on Amazon Web Services (AWS).
DataRobot can be installed as a service on YARN in Hadoop clusters, and perform distributed model scoring on data stored on HDFS.
DataRobot offers fine-grained role-based security, including two-factor authentication, and supports Kerberos and LDAP protocols.
DataRobot’s MLOps
DataRobot’s MLOps capabilities are integrated into the DataRobot Enterprise AI Platform and allow to manage all production models from one place. They help with flexible deployment of custom models built with Python, R or other compatible machine learning platforms.Centralized management of models, embedded anywhere, built by anyone can be carried out across organization.
MLOps also ensure proactive management of production models to prevent production issues and ensuring both model trust and performance. Real-time dashboards with automated monitoring and alerts provide information on data deviations and key model metrics to quickly and proactively adapt to changing conditions. This also provide deep production diagnostics in order to improve failure prediction, minimize SLA violations, and optimize system operations.
In the meantime, MLOps guarantee safe scaling of AI projects and maintain control over production models and compliance with regulations. Access to production models and systems could be appropriately limited to manage organizational risk and satisfy regulatory requirements. Individual predictions is possible to trace back to the production model used to make each prediction to ensure legal and regulatory compliance. MLOps also keep an audit trail over the the lifetime of a model deployment showing when and where the model was deployed, who made updates and why.
Paxata for automated data preparation
Paxata is the software that conducts automatically raw data preparation which include finding, ingesting, profiling, cleaning, and transforming raw data with an enterprise-grade, self-service data preparation application. Paxata Adaptive Information Platform can create actionable information in real-time with the help machine learning, an intuitive user experience, and a smart distributed architecture.
It can ingest data from a wide variety of enterprise sources, including complex semi-structured files such as XML or JSON,NoSQL and relational databases, and cloud applications. Paxata intelligently detects data source types and transforms it into a tabular format for point-and-click interaction and profiling.
Paxata profiles data and generates a scorecard showing data type distribution, field completeness, field length analysis, top/ bottom patterns, leading or trailing string patterns, min/max and range, special character analysis, custom calculations, and more. Its interactive, Excel-like interface allows you to search, investigate and discover trends, outliers, and patterns across entire data set – not just a sample, and validate your data visually for immediate feedback. You can also combine and blend multiple, mixed-type data sources together.
The software standardize similar values and misspellings, joins, appends, and overlaps across data sources with smart machine learning recommendations. Then it-documents your steps to create repeatability, auditing, and governance.
Exporting and publishing of result datasets is possible to a broadrange of databases, applications, AI platforms like DataRobot or popular BI tools, such as Tableau, Qlik, MicroStrategy, PowerBI, and many others.
DataRobot Integration with Factset
DataRobot on FactSet integrates automated machine learning technology into the FactSet workstation, which allows you to build, deploy, monitor, and manage sophisticated machine learning models quickly and easily. Access hundreds of powerful open source machine learning algorithms to create predictive, automated AI applications for factors such as equity volatility, bond performance, and macroeconomic event predictions. This allows to:
Create AI-driven investment decision workflows. Combination of DataRobot’s automated machine learning with FactSet’s content and applications to embed AI directly into data exploration, signal generation, and evaluation workflows in an integrated manner.
Connect core datasets to advanced machine learning algorithms. Power automated machine learning workflow with core datasets including fundamentals and estimates, premium unique content such as sentiment and ESG, and your own proprietary content.
Enhance factor research with automated machine learning. Easily build and deploy machine learning models to produce more intelligent alpha signals, risk forecasts, macroindicators within FactSet applications spanning the entire portfolio lifecycle.
About DataRobot
DataRobot is a Boston-based tech company with offices in New York, London, Kyiv, Singapore, Tokyo, and Sydney. Backed by prominent venture investors like NEA, Atlas Ventures, and TechStars, it developes an automated machine learning platform that helps data scientists and analysts of all levels build and deploy better predictive models and improve discoverability of valuable business insights. DataRobots works with global brands, like Mitsubishi Heavy Industries, Nippon Steel, Airbnb, United Airlines, Panasonic, and others. It enable advanced predictive analytics at many Fortune 500 enterprises, including largest US banks, insurers, fintechs, energy companies, and healthcare institutions.
You can read more about automated machine learning here
Leave a Reply