State-of-the-art machine learning/AI systems consist of complex pipelines with choices of hyperparameters, models and configuration details that need to be tuned for optimal performance. Azure AutoML is the software for automating the time consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality
Azure Auto-ML
Azure Automated Machine Learning – cloud-based environment for training, deploying, automating, managing and tracking ML models. Automated machine learning picks an algorithm and hyperparameters for you and generates a model ready for deployment.
Automated ML democratizes the machine learning model development process, and empowers its users, no matter their data science expertise, to identify an end-to-end machine learning pipeline for any problem.
Data scientists, analysts, and developers across industries can use automated ML to:
- Implement ML solutions without extensive programming knowledge
- Save time and resources
- Leverage data science best practices
- Provide agile problem-solving
Benefits for companies:
- Automatically build and deploy predictive models using the no-code UI or through a code-first notebooks experience.
- Increase productivity with easy data exploration and profiling and with intelligent feature engineering.
- Easily create accurate models customized to your data and refined by a wide array of algorithms and hyperparameters.
- Build responsible AI solutions with model interpretability, and fine-tune your models to improve accuracy.
Azure AutoML: classify, regression, & forecast
Azure Automated Machine Learning could be applied to classifications, regressions and forecasting. Classification is a common machine learning task. Azure Machine Learning offers featurizations specifically for these tasks, such as deep neural network text featurizers for classification. Common classification examples include fraud detection, handwriting recognition, and object detection.
Regression tasks are also a common supervised learning task. Azure Machine Learning offers featurizations specifically for these tasks. Different from classification where predicted output values are categorical, regression models predict numerical output values based on independent predictors. In regression, the objective is to help establish the relationship among those independent predictor variables by estimating how one variable impacts the others. For example, automobile price based on features like, gas mileage, safety rating, etc.
Time-series forecasting is an integral part of any business, whether it’s revenue, inventory, sales, or customer demand. Automated ML can be used to combine techniques and approaches and get a recommended, high-quality time-series forecast.
An automated time-series experiment is treated as a multivariate regression problem. Past time-series values are “pivoted” to become additional dimensions for the regressor together with other predictors. This approach, unlike classical time series methods, has an advantage of naturally incorporating multiple contextual variables and their relationship to one another during training. Automated ML learns a single, but often internally branched model for all items in the dataset and prediction horizons. More data is thus available to estimate model parameters and generalization to unseen series becomes possible.
Advanced forecasting configuration includes:
- holiday detection and featurization
- time-series and DNN learners (Auto-ARIMA, Prophet, ForecastTCN)
- many models support through grouping
- rolling-origin cross validation
- configurable lags
AutoML workflow
During training, Azure Machine Learning creates a number of pipelines in parallel that try different algorithms and parameters for you. The service iterates through ML algorithms paired with feature selections, where each iteration produces a model with a training score. The higher the score, the better the model is considered to “fit” the data. It will stop once it hits the exit criteria defined in the experiment.
Using Azure Machine Learning, design and run automated ML training experiments with these steps:
- Identify the ML problem to be solved: classification, forecasting, or regression
- Choose whether to use the Python SDK or the studio web experience:
- Specify the source and format of the labeled training data: Numpy arrays or Pandas dataframe
- Configure the compute target for model training, such as your local computer, Azure Machine Learning Computes, remote VMs, or Azure Databricks.
- Configure the automated machine learning parameters that determine how many iterations over different models, hyperparameter settings, advanced preprocessing/featurization, and what metrics to look at when determining the best model.
- Submit the training run.
- Review the results
Model Configuration
There are several options that you can use to configure automated machine learning experiments.
- Configuration options available in automated machine learning:
- Select your experiment type: Classification, Regression, or Time Series Forecasting
- Data source, formats, and fetch data
- Choose your compute target: local or remote
- Automated machine learning experiment settings
- Run an automated machine learning experiment
- Explore model metrics
- Register and deploy model
Automated machine learning supports data that resides on your local desktop or in the cloud such as Azure Blob Storage. The data can be read into a Pandas DataFrame or an Azure Machine Learning TabularDataset.
Feature engineering
Feature engineering is the process of using domain knowledge of the data to create features that help ML algorithms learn better. In Azure Machine Learning, scaling and normalization techniques are applied to facilitate feature engineering. Collectively, these techniques and feature engineering are referred to as featurization.
Automated machine learning featurization steps (feature normalization, handling missing data, converting text to numeric, etc.) become part of the underlying model. When using the model for predictions, the same featurization steps applied during training are applied to your input data automatically.
In every automated machine learning experiment, data is automatically scaled or normalized to help algorithms perform well. During model training, one of the following scaling or normalization techniques will be applied to each model. Additional feature engineering techniques such as, encoding and transforms are also available.
Ensemble models
Automated machine learning supports ensemble models, which are enabled by default. Ensemble learning improves machine learning results and predictive performance by combining multiple models as opposed to using single models. The ensemble iterations appear as the final iterations of the run. Automated machine learning uses both voting and stacking ensemble methods for combining models:
- Voting: predicts based on the weighted average of predicted class probabilities (for classification tasks) or predicted regression targets (for regression tasks).
- Stacking: stacking combines heterogenous models and trains a meta-model based on the output from the individual models. The current default meta-models are LogisticRegression for classification tasks and ElasticNet for regression/forecasting tasks.
Many models
The Many Models Solution Accelerator (preview) builds on Azure Machine Learning and enables you to use automated ML to train, operate, and manage hundreds or even thousands of machine learning models.
For example, building a model for each instance or individual in the following scenarios can lead to improved results:
- Predicting sales for each individual store
- Predictive maintenance for hundreds of oil wells
- Tailoring an experience for individual users.
AutoML on Databricks
AutoML on Databricks automates Machine Learning pipelines from feature engineering, model search, hyperparameter tuning, and inference while providing data scientists with the flexibility and control they need.
Databricks automates various steps of the data science workflow including augmented data preparation, visualization, feature engineering, hyperparameter tuning, model search, and finally automatic model tracking, reproducibility, and deployment, through a combination of native product offerings, partnerships, and custom solutions for a fully controlled and transparent AutoML experience.
Benefits
- Scalability: Automatically scale up and down your workloads and speed up training time with out-of-the-box optimizations for the most popular ML frameworks.
- Control: Choose algorithms better suited for the task in either single node or multi-node environment, and limit the number of runs to keep costs down.
- Ease of use: Automatically log results with MLflow tracking and parallelize hyperparameter search with Hyperopt on Databricks.
- Unification: Run all AutoML steps on the same platform, from ETL to model training and inference, securely, collaboratively, and at scale.
Features
- MLflow Experiments Tracking
- Track, compare, and visualize hundreds of thousands of experiments using open source or Managed MLflow.
- Automated Hyperparameter Tuning for Distributed Machine Learning
- Deep integration with PySpark MLlib’s Cross Validation to automatically track MLlib experiments in MLflow.
- Automated Hyperparameter Tuning for Single-node Machine Learning
- Optimized and distributed hyperparameter search with enhanced Hyperopt and automated tracking to MLflow.
- Automated Model Search for Single-node Machine Learning
- Optimized and distributed conditional hyperparameter search with enhanced Hyperopt and automated tracking to MLflow.
- Databricks Labs AutoML Toolkit
- Automated end-to-end model building pipeline is available via Databricks Labs custom solutions.
AutoML & ONNX
With Azure Machine Learning, use automated ML to build a Python model and have it converted to the ONNX format. Once the models are in the ONNX format, they can be run on a variety of platforms and devices.
The ONNX runtime also supports C#, so use the model built automatically in C# apps without any need for recoding or any of the network latencies that REST endpoints introduce.
About Microsoft Azure
Microsoft Azure is a cloud computing service created by Microsoft for building, testing, deploying, and managing applications and services through Microsoft-managed data centers. It provides software as a service (SaaS), platform as a service (PaaS) and infrastructure as a service (IaaS) and supports many different programming languages, tools, and frameworks, including both Microsoft-specific and third-party software and systems.
Leave a Reply