Daily Dose of Data Science – Day 7 – Machine Learning made easy with FLAML

Daily Dose of Data Science – Day 7 – Machine Learning made easy with FLAML

It’s time for today’s Daily Dose of Data Science! Automated Machine Learning has been a hot topic since the last two years! Today, you will get to know about Fast and Lightweight AutoML from Microsoft called FLAML. Let’s dive in to know more.

FLAML is a lightweight Python framework which helps you to perform and execute automated machine learning efficiently and economically on your dataset. In most AutomML frameworks, users might still have to pick up their choice of learner or algorithm and sometimes may have to set a range of the hyperparameters but FLAML makes it easier for the users by selecting learners and hyperparameters for each learner completely by itself. Also, usually AutoML can be a slow process considering the dataset and may require high computation resources to try out multiple possible choices for hyperparameters and then select the best one. But FLAML is extremely fast and computationally inexpensive. It has a lightweight design which makes it easy to extend for adding customized learners or metrics.

Microsoft Research has designed a new, cost-effective hyperparameter optimization and learner selection method which is one of the core benefits of the FLAML framework. It leverages the structure of the search space to choose a search order optimized for both cost and error. For example, the system tends to propose cheap configurations at the beginning stage of the search, but quickly moves to configurations with high model complexity and large sample size when needed in the later stage of the search. For another example, it favors simple learners, following the principle of Ocam’s Razor in the beginning but penalizes them later if the error improvement is slow. The cost-bounded search and cost-based prioritization make a big difference in the search efficiency under budget constraints.

FLAML also has a .NET implementation as well from ML.NET Model Builder. This ML.NET blog describes the improvement brought by FLAML.

Setting up FLAML is very easy and can be done using the pip installer.

pip install flaml
pip install flaml[notebook] # The Jupyter notebook version

Some of it’s key advantages are listed as below:

  • For common ML tasks like classification and regression, find quality models with small computational resources.
  • Users can choose their desired customizability: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), full customization (arbitrary training and evaluation code).
  • Allow human guidance in hyperparameter tuning to respect prior on certain subspaces but also able to explore other subspaces. Read more about the hyperparameter optimization methods in FLAML here. They can be used beyond the AutoML context. And they can be used in distributed HPO frameworks such as ray tune or nni.
  • Support online AutoML: automatic hyperparameter tuning for online learning algorithms. Read more about the online AutoML method in FLAML here.

All the above details have been taken from the official GitHub repository of FLAML. Please visit the project homepage of know more about this framework in details. As listed in the some of the examples of it’s usage in the GitHub page, these are some example usage of the framework:

# For Time Series Forecasting
# pip install flaml[forecast]
import numpy as np
from flaml import AutoML
X_train = np.arange('2014-01', '2021-01', dtype='datetime64[M]')
y_train = np.random.random(size=72)
automl = AutoML()
automl.fit(X_train=X_train[:72],  # a single column of timestamp
           y_train=y_train,  # value for each timestamp
           period=12,  # time horizon to forecast, e.g., 12 months
           task='forecast', time_budget=15,  # time budget in seconds
           log_file_name="test/forecast.log",
          )
print(automl.predict(X_train[72:]))

##############################################################################

# For Regression Forecasting

from flaml import AutoML
from sklearn.datasets import load_boston
# Initialize an AutoML instance
automl = AutoML()
# Specify automl goal and constraint
automl_settings = {
    "time_budget": 10,  # in seconds
    "metric": 'r2',
    "task": 'regression',
    "log_file_name": "test/boston.log",
}
X_train, y_train = load_boston(return_X_y=True)
# Train with labeled input data
automl.fit(X_train=X_train, y_train=y_train,
           **automl_settings)
# Predict
print(automl.predict(X_train))
# Export the best model
print(automl.model)

#############################################################################

# For classification based forecasting

from flaml import AutoML
from sklearn.datasets import load_iris
# Initialize an AutoML instance
automl = AutoML()
# Specify automl goal and constraint
automl_settings = {
    "time_budget": 10,  # in seconds
    "metric": 'accuracy',
    "task": 'classification',
    "log_file_name": "test/iris.log",
}
X_train, y_train = load_iris(return_X_y=True)
# Train with labeled input data
automl.fit(X_train=X_train, y_train=y_train,
           **automl_settings)
# Predict
print(automl.predict_proba(X_train))
# Export the best model
print(automl.model)

##############################################################################

Please take a look at the official and the recommended documentation of FLAML

  • API documentation here.
  • Please find demo and tutorials of FLAML here.

For more technical details, please check the papers.

I will strongly recommend using FLAML for any standard lightweight Machine Learning operation on any standard type of dataset and even it can be used for any complex dataset for getting quick benchmark models. Do try it out yourself!

That’s all folks for today’s dose! Stay tuned for another daily dose of data science and please feel free to like, share, comment and subscribe to my posts if you find it helpful!

Tags: , , , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *