Management Summary

Deploying and monitoring machine learning projects is a complex undertaking. In addition to the consistent documentation of model parameters and the associated evaluation metrics, the main challenge is to transfer the desired model into a productive environment. If several people are involved in the development, additional synchronization problems arise concerning the models’ development environments and version statuses. For this reason, tools for the efficient management of model results through to extensive training and inference pipelines are required. In this article, we present the typical challenges along the machine learning workflow and describe a possible solution platform with MLflow. In addition, we present three different scenarios that can be used to professionalize machine learning workflows:

  1. Entry-level Variant: Model parameters and performance metrics are logged via a R/Python API and clearly presented in a GUI. In addition, the trained models are stored as artifacts and can be made available via APIs.
  2. Advanced Model Management: In addition to tracking parameters and metrics, certain models are logged and versioned. This enables consistent monitoring and simplifies the deployment of selected model versions.
  3. Collaborative Workflow Management: Encapsulating Machine Learning projects as packages or Git repositories and the accompanying local reproducibility of development environments enable smooth development of Machine Learning projects with multiple stakeholders.

Depending on the maturity of your machine learning project, these three scenarios can serve as inspiration for a potential machine learning workflow. We have elaborated each scenario in detail for better understanding and provide recommendations regarding the APIs and deployment environments to use.

Challenges Along the Machine Learning Workflow

Training machine learning models is becoming easier and easier. Meanwhile, a variety of open-source tools enable efficient data preparation as well as increasingly simple model training and deployment.

The added value for companies comes primarily from the systematic interaction of model training, in the form of model identification, hyperparameter tuning and fitting on the training data, and deployment, i.e., making the model available for inference tasks. This interaction is often not established as a continuous process, especially in the early phases of machine learning initiative development. However, a model can only generate added value in the long term if a stable production process is implemented from model training, through its validation, to testing and deployment. If this process is implemented correctly, complex dependencies and costly maintenance work in the long term can arise during the operational start-up of the model [2]. The following risks are particularly noteworthy in this regard.

1. Ensuring Synchronicity

Often, in an exploratory context, data preparation and modeling workflows are developed locally. Different configurations of development environments or even the use of different technologies make it difficult to reproduce results, especially between developers or teams. In addition, there are potential dangers concerning the compatibility of the workflow if several scripts must be executed in a logical sequence. Without an appropriate version control logic, the synchronization effort afterward can only be guaranteed with great effort.

2. Documentation Effort

To evaluate the performance of the model, model metrics are often calculated following training. These depend on various factors, such as the parameterization of the model or the influencing factors used. This meta-information about the model is often not stored centrally. However, for systematic further development and improvement of a model, it is mandatory to have an overview of the parameterization and performance of all past training runs.

3. Heterogeneity of Model Formats

In addition to managing model parameters and results, there is the challenge of subsequently transferring the model to the production environment. If different models from multiple packages are used for training, deployment can quickly become cumbersome and error-prone due to different packages and versions.

4. Recovery of Prior Results

In a typical machine learning project, the situation often arises that a model is developed over a long period of time. For example, new features may be used, or entirely new architectures may be evaluated. These experiments do not necessarily lead to better results. If experiments are not versioned cleanly, there is a risk that old results can no longer be reproduced.

Various tools have been developed in recent years to solve these and other challenges in the handling and management of machine learning workflows, such as TensorFlow TFX, cortex, Marvin, or MLFlow. The latter, in particular, is currently one of the most widely used solutions.

MLflow is an open-source project with the goal to combine the best of existing ML platforms to make the integration to existing ML libraries, algorithms, and deployment tools as straightforward as possible [3]. In the following, we will introduce the main MLflow modules and discuss how machine learning workflows can be mapped via MLflow.

MLflow Services

MLflow consists of four components: MLflow Tracking, MLflow Models, MLflow Projects, and MLflow Registry. Depending on the requirements of the experimental and deployment scenario, all services can be used together, or individual components can be isolated.

With MLflow Tracking, all hyperparameters, metrics (model performance), and artifacts, such as charts, can be logged. MLflow Tracking provides the ability to collect presets, parameters, and results for collective monitoring for each training or scoring run of a model. The logged results can be visualized in a GUI or alternatively accessed via a REST API.

The MLflow Models module acts as an interface between technologies and enables simplified deployment. Depending on its type, a model is stored as a binary, e.g., a pure Python function, or as a Keras or H2O model. One speaks here of the so-called model flavors. Furthermore, MLflow Models provides support for model deployment on various machine learning cloud services, e.g., for AzureML and Amazon Sagemaker.

MLflow Projects are used to encapsulate individual ML projects in a package or Git repository. The basic configurations of the respective environment are defined via a YAML file. This can be used, for example, to control how exactly the conda environment is parameterized, which is created when MLflow is executed. MLflow Projects allows experiments that have been developed locally to be executed on other computers in the same environment. This is an advantage, for example, when developing in smaller teams.

MLflow Registry provides a centralized model management. Selected MLflow models can be registered and versioned in it. A staging workflow enables a controlled transfer of models into the productive environment. The entire process can be controlled via a GUI or a REST API.

Examples of Machine Learning Pipelines Using MLflow

In the following, three different ML workflow scenarios are presented using the above MLflow modules. These increase in complexity from scenario to scenario. In all scenarios, a dataset is loaded into a development environment using a Python script, processed, and a machine learning model is trained. The last step in all scenarios is a deployment of the ML model in an exemplary production environment.

1. Scenario – Entry-Level Variant

Szenario 1 – Simple Metrics TrackingScenario 1 – Simple Metrics Tracking

Scenario 1 uses the MLflow Tracking and MLflow Models modules. Using the Python API, the model parameters and metrics of the individual runs can be stored on the MLflow Tracking Server Backend Store, and the corresponding MLflow Model File can be stored as an artifact on the MLflow Tracking Server Artifact Store. Each run is assigned to an experiment. For example, an experiment could be called ‘fraud_classification’, and a run would be a specific ML model with a certain hyperparameter configuration and the corresponding metrics. Each run is stored with a unique RunID.

Artikel MLFlow Tool Bild 01

In the screenshot above, the MLflow Tracking UI is shown as an example after executing a model training. The server is hosted locally in this example. Of course, it is also possible to host the server remotely. For example in a Docker container within a virtual machine. In addition to the parameters and model metrics, the time of the model training, as well as the user and the name of the underlying script, are also logged. Clicking on a specific run also displays additional information, such as the RunID and the model training duration.

Artikel MLFlow Tool Bild 02

If you have logged other artifacts in addition to the metrics, such as the model, the MLflow Model Artifact is also displayed in the Run view. In the example, a model from the sklearn.svm package was used. The MLmodel file contains metadata with information about how the model should be loaded. In addition to this, a conda.yaml is created that contains all the package dependencies of the environment at training time. The model itself is located as a serialized version under model.pkl and contains the model parameters optimized on the training data.

Artikel MLFlow Tool Bild 03

The deployment of the trained model can now be done in several ways. For example, suppose one wants to deploy the model with the best accuracy metric. In that case, the MLflow tracking server can be accessed via the Python API mlflow.list_run_infos to identify the RunID of the desired model. Now, the path to the desired artifact can be assembled, and the model loaded via, for example, the Python package pickle. This workflow can now be triggered via a Dockerfile, allowing flexible deployment to the infrastructure of your choice. MLflow offers additional separate APIs for deployment on Microsoft Azure and AWS. For example, if the model is to be deployed on AzureML, an Azure ML container image can be created using the Python API mlflow.azureml.build_image, which can be deployed as a web service to Azure Container Instances or Azure Kubernetes Service. In addition to the MLflow Tracking Server, it is also possible to use other storage systems for the artifact, such as Amazon S3, Azure Blob Storage, Google Cloud Storage, SFTP Server, NFS, and HDFS.

2. Scenario – Advanced Model Management

Szenario 2 – Advanced Model ManagementScenario 2 – Advanced Model Management

Scenario 2 includes, in addition to the modules used in scenario 1, MLflow Model Registry as a model management component. Here, it is possible to register and process the models logged there from specific runs. These steps can be controlled via the API or GUI. A basic requirement to use the Model Registry is deploying the MLflow Tracking Server Backend Store as Database Backend Store. To register a model via the GUI, select a specific run and scroll to the artifact overview.

Artikel MLFlow Tool Bild 04

Clicking on Register Model opens a new window in which a model can be registered. If you want to register a new version of an already existing model, select the desired model from the dropdown field. Otherwise, a new model can be created at any time. After clicking the Register button, the previously registered model appears in the Models tab with corresponding versioning.

Artikel MLFlow Tool Bild 05

Each model includes an overview page that shows all past versions. This is useful, for example, to track which models were in production when.

Artikel MLFlow Tool Bild 06

If you now select a model version, you will get to an overview where, for example, a model description can be added. The Source Run link also takes you to the run from which the model was registered. Here you will also find the associated artifact, which can be used later for deployment.

Artikel MLFlow Tool Bild 07

In addition, individual model versions can be categorized into defined phases in the Stage area. This feature can be used, for example, to determine which model is currently being used in production or is to be transferred there. For deployment, in contrast to scenario 1, versioning and staging status can be used to identify and deploy the appropriate model. For this, the Python API MlflowClient().search_model_versions can be used, for example, to filter the desired model and its associated RunID. Similar to scenario 1, deployment can then be completed to, for example, AWS Sagemaker or AzureML via the respective Python APIs.

3. Scenario – Collaborative Workflow Management

Szenario 3 – Full Workflow ManagementScenario 3 – Full Workflow Management

In addition to the modules used in scenario 2, scenario 3 also includes the MLflow Projects module. As already explained, MLflow Projects are particularly well suited for collaborative work. Any Git repository or local environment can act as a project and be controlled by an MLproject file. Here, package dependencies can be recorded in a conda.yaml, and the MLproject file can be accessed when starting the project. Then the corresponding conda environment is created with all dependencies before training and logging the model. This avoids the need for manual alignment of the development environments of all developers involved and also guarantees standardized and comparable results of all runs. Especially the latter is necessary for the deployment context since it cannot be guaranteed that different package versions produce the same model artifacts. Instead of a conda environment, a Docker environment can also be defined using a Dockerfile. This offers the advantage that package dependencies independent of Python can also be defined. Likewise, MLflow Projects allow the use of different commit hashes or branch names to use other project states, provided a Git repository is used.

An interesting use case is the modularized development of machine learning training pipelines [4]. For example, data preparation can be decoupled from model training and developed in parallel, while another team uses a different branch name to train the model. In this case, only a different branch name must be used as a parameter when starting the project in the MLflow Projects file. The final data preparation can then be pushed to the same branch name used for model training and would thus already be fully implemented in the training pipeline. The deployment can also be controlled as a sub-module within the project pipeline through a Python script via the ML Project File and can be carried out analogous to scenario 1 or 2 on a platform of your choice.

Conclusion and Outlook

MLflow offers a flexible way to make the machine learning workflow robust against the typical challenges in the daily life of a data scientist, such as synchronization problems due to different development environments or missing model management. Depending on the maturity level of the existing machine learning workflow, various services from the MLflow portfolio can be used to achieve a higher level of professionalization.

In the article, three machine learning workflows, ascending in complexity, were presented as examples. From simple logging of results in an interactive UI to more complex, modular modeling pipelines, MLflow services can support it. Logically, there are also synergies outside the MLflow ecosystem with other tools, such as Docker/Kubernetes for model scaling or even Jenkins for CI/CD pipeline control. If there is further interest in MLOps challenges and best practices, I refer you to the webinar on MLOps by our CEO Sebastian Heinz, which we provide free of charge.

Resources

John Vicente

In a recent project at STATWORX, I’ve developed a large scale deep learning application for image classification using Keras and Tensorflow. After developing the model, we needed to deploy it in a quite complex pipeline of data acquisition and preparation routines in a cloud environment. We decided to deploy the model on a prediction server that exposes the model through an API. Thereby, we came across NVIDIA TensorRT Server (TRT Server), a serious alternative to good old TF Serving (which is an awesome product, by the way!). After checking the pros and cons, we decided to give TRT Server a shot. TRT Server has sevaral advantages over TF Serving, such as optimized inference speed, easy model management and ressource allocation, versioning and parallel inference handling. Furthermore, TensorRT Server is not “limited” to TensorFlow (and Keras) models. It can serve models from all major deep learning frameworks, such as TensorFlow, MxNet, pytorch, theano, Caffe and CNTK.

Despite the load of cool features, I found it a bit cumbersome to set up the TRT server. The installation and documentation is scattered to quite a few repositories, documetation guides and blog posts. That is why I decided to write this blog post about setting up the server and get your predictions going!

NVIDIA TensorRT Server

TensorRT Inference Server is NVIDIA’s cutting edge server product to put deep learning models into production. It is part of the NVIDIA’s TensorRT inferencing platform and provides a scaleable, production-ready solution for serving your deep learning models from all major frameworks. It is based on NVIDIA Docker and contains everything that is required to run the server from the inside of the container. Furthermore, NVIDIA Docker allows for using GPUs inside a Docker container, which, in most cases, significantly speeds up model inference. Talking about speed – TRT Server can be considerably faster than TF Serving and allows for multiple inferences from multiple models at the same time, using CUDA streams to exploit GPU scheduling and serialization (see image below).

Visualization of model serialization and parallelism

With TRT Server you can specify the number of concurrent inference computations using so called instance groups that can be configured on the model level (see section “Model Configuration File”) . For example, if you are serving two models and one model gets significantly more inference requests, you can assign more GPU ressources to this model allowing you to compute more multiple requests in parallel. Furthermore, instance groups allow you to specify, whether a model should be executed on CPU or GPU, which can be a very interesting feature in more complex serving environments. Overall, TRT Server has a bunch of great features that makes it interesting for production usage.

NVIDIA architecture

The upper image illustrates the general architecture of the server. One can see the HTTP and gRPC interfaces that allow you to integrate your models into other applications that are connected to the server over LAN or WAN. Pretty cool! Furthermore, the server exposes a couple of sanity features such as health status checks etc., that also come in handy in production.

Setting up the Server

As mentioned before, TensorRT Server lives inside a NVIDIA Docker container. In order to get things going, you need to complete several installation steps (in case you are starting with a blank machine, like here). The overall process is quite long and requires a certain amount of “general cloud, network and IT knowledge”. I hope, that the following steps make the installation and setup process clear to you.

Launch a Deep Learning VM on Google Cloud

For my project, I used a Google Deep Learning VM that comes with preinstalled CUDA as well as TensorFlow libraries. You can launch a cloud VM using the Google Cloud SDK or in the GCP console (which is pretty easy to use, in my opinion). The installation of the GCP SDK can be found here. Please note, that it might take some time until you can connect to the server because of the CUDA installation process, which takes several minutes. You can check the status of the VM in the cloud logging console.

# Create project
gcloud projects create tensorrt-server

# Start instance with deep learning image
gcloud compute instances create tensorrt-server-vm 
	--project tensorrt-server 
	--zone your-zone 
	--machine-type n1-standard-4 
	--create-disk='size=50' 
	--image-project=deeplearning-platform-release 
	--image-family tf-latest-gpu 
	--accelerator='type=nvidia-tesla-k80,count=1' 
	--metadata='install-nvidia-driver=True' 
	--maintenance-policy TERMINATE

After successfully setting up your instance, you can SSH into the VM using the terminal. From there you can execute all the neccessary steps to install the required components.

# SSH into instance
gcloud compute ssh tensorrt-server-vm --project tensorrt-server --zone your-zone

Note: Of course, you have to adapt the script for your project and instance names.

Install Docker

After setting up the GCP cloud VM, you have to install the Docker service on your machine. The Google Deep Learning VM uses Debian as OS. You can use the following code to install Docker on the VM.

# Install Docker
sudo apt-get update
sudo apt-get install 
    apt-transport-https 
    ca-certificates 
    curl 
    software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository 
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu 
   $(lsb_release -cs) 
   stable"
sudo apt-get update
sudo apt-get install docker-ce

You can verify that Docker has been successfully installed by running the following command.

sudo docker run --rm hello-world

You should see a “Hello World!” from the docker container which should give you something like this:

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
d1725b59e92d: Already exists 
Digest: sha256:0add3ace90ecb4adbf7777e9aacf18357296e799f81cabc9fde470971e499788
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

Congratulations, you’ve just installed Docker successfully!

Install NVIDIA Docker

Unfortunately, Docker has no “out of the box” support for GPUs connected to the host system. Therefore, the installation of the NVIDIA Docker runtime is required to use TensorRT Server’s GPU capabilities within a containerized environment. NVIDIA Docker is also used for TF Serving, if you want to use your GPUs for model inference. The following figure illustrates the architecture of the NVIDIA Docker Runtime.

NVIDIA docker

You can see, that the NVIDIA Docker Runtime is layered around the Docker engine allowing you to use standard Docker as well as NVIDIA Docker containers on your system.

Since the NVIDIA Docker Runtime is a proprietary product of NVIDIA, you have to register at NVIDIA GPU Cloud (NGC) to get an API key in order to install and download it. To authenticate against NGC execute the following command in the server command line:

# Login to NGC
sudo docker login nvcr.io

You will be prompted for username and API key. For username you have to enter $oauthtoken, the password is the generated API key. After you have successfully logged in, you can install the NVIDIA Docker components. Following the instructions on the NVIDIA Docker GitHub repo, you can install NVIDIA Docker by executing the following script (Ubuntu 14.04/16.04/18.04, Debian Jessie/Stretch).

# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker

# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | 
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | 
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

# Test nvidia-smi with the latest official CUDA image
sudo docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

Installing TensorRT Server

The next step, after successfully installing NVIDIA Docker, is to install TensorRT Server. It can be pulled from the NVIDIA Container Registry (NCR). Again, you need to be authenticated against NGC to perform this action.

# Pull TensorRT Server (make sure to check the current version)
sudo docker pull nvcr.io/nvidia/tensorrtserver:18.09-py3

After pulling the image, TRT Server is ready to be started on your cloud machine. The next step is to create a model that will be served by TRT Server.

Model Deployment

After installing the required technical components and pulling the TRT Server container you need to take care of your model and the deployment. TensorRT Server manages it’s models in a folder on your server, the so called model repository.

Setting up the Model Repository

The model repository contains your exported TensorFlow / Keras etc. model graphs in a specific folder structure. For each model in the model repository, a subfolder with the corresponding model name needs to be defined. Within those model subfolders, the model schema files (config.pbtxt), label definitions (labels.txt) as well as model version subfolders are located. Those subfolders allow you to manage and serve different model versions. The file labels.txt contains strings of the target labels in appropriate order, corresponding to the output layer of the model. Within the version subfolder a file named model.graphdef (the exported protobuf graph) is stored. model.graphdef is actually a frozen tensorflow graph, that is created after exporting a TensorFlow model and needs to be named accordingly.

Remark: I did not manage to get a working serving from a tensoflow.python.saved_model.simple_save() or tensorflow.python.saved_model.builder.SavedModelBuilder() export with TRT Server due to some variable initialization error. We therefore use the “freezing graph” approach, which converts all TensorFlow variable inside a graph to constants and outputs everything into a single file (which is model.graphdef).

/models
|-   model_1/
|--      config.pbtxt
|--      labels.txt
|--      1/
|---		model.graphdef

Since the model repository is just a folder, it can be located anywhere the TRT Server host has a network connection to. For exmaple, you can store your exported model graphs in a cloud repository or a local folder on your machine. New models can be exported and deployed there in order to be servable through the TRT Server.

Model Configuration File

Within your model repository, the model configuration file (config.pbtxt) sets important parameters for each model on the TRT Server. It contains technical information about your servable model and is required for the model to be loaded properly. There are sevaral things you can control here:

name: "model_1"
platform: "tensorflow_graphdef"
max_batch_size: 64
input [
   {
      name: "dense_1_input"
      data_type: TYPE_FP32
      dims: [ 5 ]
   }
]
output [
   {
      name: "dense_2_output"
      data_type: TYPE_FP32
      dims: [ 2 ]
      label_filename: "labels.txt"
   }
]
instance_group [
   {
      kind: KIND_GPU
      count: 4
   }
]

First, name defines the tag under the model is reachable on the server. This has to be the name of your model folder in the model repository. platform defines the framework, the model was built with. If you are using TensorFlow or Keras, there are two options: (1) tensorflow_savedmodel and tensorflow_graphdef. As mentioned before, I used tensorflow_graphdef (see my remark at the end of the previous section). batch_size, as the name says, controls the batch size for your predictions. input defines your model’s input layer node name, such as the name of the input layer (yes, you should name your layers and nodes in TensorFlow or Keras), the data_type, currently only supporting numeric types, such as TYPE_FP16, TYPE_FP32, TYPE_FP64 and the input dims. Correspondingly, output defines your model’s output layer name, it’s data_type and dims. You can specify a labels.txt file that holds the labels of the output neurons in appropriate order. Since we only have two output classes here, the file looks simply like this:

class_0
class_1

Each row defines a single class label. Note, that the file does not contain any header. The last section instance_group lets you define specific GPU (KIND_GPU)or CPU (KIND_CPU) ressources to your model. In the example file, there are 4 concurrent GPU threads assigned to the model, allowing for four simultaneous predictions.

Building a simple model for serving

In order to serve a model through TensorRT server, you’ll first need – well – a model. I’ve prepared a small script that builds a simple MLP for demonstration purposes in Keras. I’ve already used TRT Server successfully with bigger models such as InceptionResNetV2 or ResNet50 in production and it worked very well.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import InputLayer, Dense
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.utils import to_categorical

# Make toy data
X, y = make_classification(n_samples=1000, n_features=5)

# Make target categorical
y = to_categorical(y)

# Train test split
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Scale inputs
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Model definition
model_1 = Sequential()
model_1.add(Dense(input_shape=(X_train.shape[1], ),
                  units=16, activation='relu', name='dense_1'))
model_1.add(Dense(units=2, activation='softmax', name='dense_2'))
model_1.compile(optimizer='adam', loss='categorical_crossentropy')

# Early stopping
early_stopping = EarlyStopping(patience=5)
model_checkpoint = ModelCheckpoint(filepath='model_checkpoint.h5',
                                   save_best_only=True,
                                   save_weights_only=True)
callbacks = [early_stopping, model_checkpoint]

# Fit model and load best weights
model_1.fit(x=X_train, y=y_train, validation_data=(X_test, y_test),
            epochs=50, batch_size=32, callbacks=callbacks)

# Load best weights after early stopping
model_1.load_weights('model_checkpoint.h5')

# Export model
model_1.save('model_1.h5')

The script builds some toy data using sklearn.datasets.make_classification and fits a single layer MLP to the data. After fitting, the model gets saved for further treatment in a separate export script.

Freezing the graph for serving

Serving a Keras (TensorFlow) model works by exporting the model graph as a separate protobuf file (.pb-file extension). A simple way to export the model into a single file, that contains all the weights of the network, is to “freeze” the graph and write it to disk. Thereby, all the tf.Variables in the graph are converted to tf.constant which are stored together with the graph in a single file. I’ve modified this script for that purpose.

import os
import shutil
import keras.backend as K
import tensorflow as tf
from keras.models import load_model
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import graph_io

def freeze_model(model, path):
    """ Freezes the graph for serving as protobuf """
    # Remove folder if present
    if os.path.isdir(path):
        shutil.rmtree(path)
        os.mkdir(path)
        shutil.copy('config.pbtxt', path)
        shutil.copy('labels.txt', path)
    # Disable Keras learning phase
    K.set_learning_phase(0)
    # Load model
    model_export = load_model(model)
    # Get Keras sessions
    sess = K.get_session()
    # Output node name
    pred_node_names = ['dense_2_output']
    # Dummy op to rename the output node
    dummy = tf.identity(input=model_export.outputs[0], name=pred_node_names)
    # Convert all variables to constants
    graph_export = graph_util.convert_variables_to_constants(
        sess=sess,
        input_graph_def=sess.graph.as_graph_def(),
        output_node_names=pred_node_names)
    graph_io.write_graph(graph_or_graph_def=graph_export,
                         logdir=path + '/1',
                         name='model.graphdef',
                         as_text=False)

# Freeze Model
freeze_model(model='model_1.h5', path='model_1')

# Upload to GCP
os.system('gcloud compute scp model_1 tensorrt-server-vm:~/models/ --project tensorrt-server --zone us-west1-b --recurse')

The freeze_model() function takes the path to the saved Keras model file model_1.h5 as well as the path for the graph to be exported. Furthermore, I’ve enhanced the function in order to build the required model repository folder structure containing the version subfolder, config.pbtxt as well as labels.txt, both stored in my project folder. The function loads the model and exports the graph into the defined destination. In order to do so, you need to define the output node’s name and then convert all variables in the graph to constants using graph_util.convert_variables_to_constants, which uses the respective Keras backend session, that has to be fetched using K.get_session(). Furthermore, it is important to disable the Keras learning mode using K.set_learning_phase(0) prior to export. Lastly, I’ve included a small CLI command that uploads my model folder to my GCP instance to the model repository /models.

Starting the Server

Now that everything is installed, set up and configured, it is (finally) time to launch our TRT prediciton server. The following command starts the NVIDIA Docker container and maps the model repository to the container.

sudo nvidia-docker run --rm --name trtserver -p 8000:8000 -p 8001:8001 
-v ~/models:/models nvcr.io/nvidia/tensorrtserver:18.09-py3 trtserver 
--model-store=/models

--rm removes existing containers of the same name, given by --name. -p exposes ports 8000 (REST) and 8001 (gRPC) on the host and maps them to the respective container ports. -v mounts the model repository folder on the host, which is /models in my case, to the container into /models, which is then referenced by --model-store as the location to look for servable model graphs. If everything goes fine you should see similar console output as below. If you don’t want to see the output of the server, you can start the container in detached model using the -d flag on startup.

===============================
== TensorRT Inference Server ==
===============================

NVIDIA Release 18.09 (build 688039)

Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
Copyright 2018 The TensorFlow Authors.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

I1014 10:38:55.951258 1 server.cc:631] Initializing TensorRT Inference Server
I1014 10:38:55.951339 1 server.cc:680] Reporting prometheus metrics on port 8002
I1014 10:38:56.524257 1 metrics.cc:129] found 1 GPUs supported power usage metric
I1014 10:38:57.141885 1 metrics.cc:139]   GPU 0: Tesla K80
I1014 10:38:57.142555 1 server.cc:884] Starting server 'inference:0' listening on
I1014 10:38:57.142583 1 server.cc:888]  localhost:8001 for gRPC requests
I1014 10:38:57.143381 1 server.cc:898]  localhost:8000 for HTTP requests
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 235] RAW: Entering the event loop ...
I1014 10:38:57.880877 1 server_core.cc:465] Adding/updating models.
I1014 10:38:57.880908 1 server_core.cc:520]  (Re-)adding model: model_1
I1014 10:38:57.981276 1 basic_manager.cc:739] Successfully reserved resources to load servable {name: model_1 version: 1}
I1014 10:38:57.981313 1 loader_harness.cc:66] Approving load for servable version {name: model_1 version: 1}
I1014 10:38:57.981326 1 loader_harness.cc:74] Loading servable version {name: model_1 version: 1}
I1014 10:38:57.982034 1 base_bundle.cc:180] Creating instance model_1_0_0_gpu0 on GPU 0 (3.7) using model.savedmodel
I1014 10:38:57.982108 1 bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /models/model_1/1/model.savedmodel
I1014 10:38:57.982138 1 reader.cc:31] Reading SavedModel from: /models/model_1/1/model.savedmodel
I1014 10:38:57.983817 1 reader.cc:54] Reading meta graph with tags { serve }
I1014 10:38:58.041695 1 cuda_gpu_executor.cc:890] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I1014 10:38:58.042145 1 gpu_device.cc:1405] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
I1014 10:38:58.042177 1 gpu_device.cc:1455] Ignoring visible gpu device (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7) with Cuda compute capability 3.7. The minimum required Cuda capability is 5.2.
I1014 10:38:58.042192 1 gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
I1014 10:38:58.042200 1 gpu_device.cc:971]      0 
I1014 10:38:58.042207 1 gpu_device.cc:984] 0:   N 
I1014 10:38:58.067349 1 loader.cc:113] Restoring SavedModel bundle.
I1014 10:38:58.074260 1 loader.cc:148] Running LegacyInitOp on SavedModel bundle.
I1014 10:38:58.074302 1 loader.cc:233] SavedModel load for tags { serve }; Status: success. Took 92161 microseconds.
I1014 10:38:58.075314 1 gpu_device.cc:1455] Ignoring visible gpu device (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7) with Cuda compute capability 3.7. The minimum required Cuda capability is 5.2.
I1014 10:38:58.075343 1 gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
I1014 10:38:58.075348 1 gpu_device.cc:971]      0 
I1014 10:38:58.075353 1 gpu_device.cc:984] 0:   N 
I1014 10:38:58.083451 1 loader_harness.cc:86] Successfully loaded servable version {name: model_1 version: 1}

There is also a warning showing that you should start the container using the following arguments

--shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864

You can do this of course. However, in this example I did not use them.

Installing the Python Client

Now it is time to test our prediction server. TensorRT Server comes with several client libraries that allow you to send data to the server and get predictions. The recommended method of building the client libraries is again – Docker. To use the Docker container, that contains the client libraries, you need to clone the respective GitHub repo using:

git clone https://github.com/NVIDIA/dl-inference-server.git

Then, cd into the folder dl-inference-server and run

docker build -t inference_server_clients .

This will build the container on your machine (takes some time). To use the client libraries within the container on your host, you need to mount a folder to the container. First, start the container in an interactive session (-it flag)

docker run --name tensorrtclient --rm -it -v /tmp:/tmp/host inference_server_clients

Then, run the following commands in the container’s shell (you may have to create /tmp/host first):

cp build/image_client /tmp/host/.
cp build/perf_client /tmp/host/.
cp build/dist/dist/tensorrtserver-*.whl /tmp/host/.
cd /tmp/host

The code above copies the prebuilt image_client and perf_client libraries into the mounted folder and makes it accessible from the host system. Lastly, you need to install the Python client library using

pip install tensorrtserver-0.6.0-cp35-cp35m-linux_x86_64.whl

on the container system. Finally! That’s it, we’re ready to go (sounds like it was an easy way)!

Inference using the Python Client

Using Python, you can easily perform predictions using the client library. In order to send data to the server, you need an InferContext() from the inference_server.api module that takes the TRT Server IP and port as well as the desired model name. If you are using the TRT Server in the cloud, make sure, that you have appropriate firewall rules allowing for traffic on ports 8000 and 8001.

from tensorrtserver.api import *
import numpy as np

# Some parameters
outputs = 2
batch_size = 1

# Init client
trt_host = '123.456.789.0:8000' # local or remote IP of TRT Server
model_name = 'model_1'
ctx = InferContext(trt_host, ProtocolType.HTTP, model_name)

# Sample some random data
data = np.float32(np.random.normal(0, 1, [1, 5]))

# Get prediction
# Layer names correspond to the names in config.pbtxt
response = ctx.run(
    {'dense_1_input': data}, 
    {'dense_2_output': (InferContext.ResultFormat.CLASS, outputs)},
    batch_size)

# Result
print(response)
{'output0': [[(0, 1.0, 'class_0'), (1, 0.0, 'class_1')]]}

Note: It is important that the data you are sending to the server matches the floating point precision, previously defined for the input layer in the model definition file. Furthermore, the names of the input and output layers must exactly match those of your model. If everything went well, ctx.run() returns a dictionary of predicted values, which you would further postprocess according to your needs.

Conclusion and Outlook

Wow, that was quite a ride! However, TensorRT Server is a great product for putting your deep learning models into production. It is fast, scaleable and full of neat features for production usage. I did not go into details regarding inference performance. If you’re interested in more, make sure to check out this blog post from NVIDIA. I must admit, that in comparison to TRT Server, TF Serving is much more handy when it comes to installation, model deployment and usage. However, compared to TRT Server it lacks some functionalities that are handy in production. Bottom line: my team and I will definitely add TRT Server to our production tool stack for deep learning models.

If you have any comments or questions on my story, feel free to comment below! I will try to answer them. Also, feel free to use my code or share this story with your peers on social platforms of your choice.

If you’re interested in more content like this, join our mailing list, constantly bringing you new data science, machine learning and AI reads and treats from me and my team at STATWORX right into your inbox!

Lastly, follow me on LinkedIn or Twitter, if you’re interested to connect with me.

References

Google AutoML Vision is a state-of-the-art cloud service from Google that is able to build deep learning models for image recognition completely fully automated and from scratch. In this post, Google AutoML Vision is used to build an image classification model on the Zalando Fashion-MNIST dataset, a recent variant of the classical MNIST dataset, which is considered to be more difficult to learn for ML models, compared to digit MNIST.

During the benchmark, both AutoML Vision training modes, “free” (0 $, limited to 1 hour computing time) and “paid” (approx. 500 $, 24 hours computing time) were used and evaluated:

Thereby, the free AutoML model achieved a macro AUC of 96.4% and an accuracy score of 88.9% on the test set at a computing time of approx. 30 minutes (early stopping). The paid AutoML model achieved a macro AUC of 98.5% on the test set with an accuracy score of 93.9%.

Introduction

Recently, there is a growing interest in automated machine learning solutions. Products like H2O Driverless AI or DataRobot, just to name a few, aim at corporate customers and continue to make their way into professional data science teams and environments. For many use cases, AutoML solutions can significantly speed up time-2-model cycles and therefore allow for faster iteration and deployment of models (and actually start saving / making money in production).

Automated machine learning solutions will transform the data science and ML landscape substantially in the next 3-5 years. Thereby, many ML models or applications that nowadays require respective human input or expertise will likely be partly or fully automated by AI / ML models themselves. Likely, this will also yield a decline in overall demand for “classical” data science profiles in favor of more engineering and operations related data science roles that bring models into production.

A recent example of the rapid advancements in automated machine learning this is the development of deep learning image recognition models. Not too long ago, building an image classifier was a very challenging task that only few people were acutally capable of doing. Due to computational, methodological and software advances, barriers have been dramatically lowered to the point where you can build your first deep learning model with Keras in 10 lines of Python code and getting “okayish” results.

Undoubtly, there will still be many ML applications and cases that cannot be (fully) automated in the near future. Those cases will likely be more complex because basic ML tasks, such as fitting a classifier to a simple dataset, can and will easily be automated by machines.

At this point, first attempts in moving into the direction of machine learning automation are made. Google as well as other companies are investing in AutoML research and product development. One of the first professional automated ML products on the market is Google AutoML Vision.

Google AutoML Vision

Google AutoML Vision (at this point in beta) is Google’s cloud service for automated machine learning for image classification tasks. Using AutoML Vision, you can train and evaluate deep learning models without any knowledge of coding, neural networks or whatsoever.

AutoML Vision operates in the Google Cloud and can be used either based on a graphical user interface or via, REST, command line or Python. AutoML Vision implements strategies from Neural Architecture Search (NAS), currently a scientific field of high interest in deep learning research. NAS is based on the idea that another model, typically a neural network or reinforcement learning model, is designing the architecture of the neural network that aims to solve the machine learning task. Cornerstones in NAS research were the paper from Zoph et at. (2017) as well as Pham et al. (2018). The latter has also been implemented in the Python package autokeras (currently in pre-release phase) and makes neural architecture search feasible on desktop computers with a single GPU opposed to 500 GPUs used in Zoph et al.

The idea that an algorithm is able to discover architectures of a neural network seems very promising, however is still kind of limited due to computational contraints (I hope you don’t mind that I consider a 500-1000 GPU cluster as as computational contraint). But how good does neural architecture search actually work in a pre-market-ready product?

Benchmark

In the following section, Google AutoML vision is used to build an image recognition model based on the Fashion-MNIST dataset.

Dataset

The Fashion-MNIST dataset is supposed to serve as a “drop-in replacement” for the traditional MNIST dataset and has been open-sourced by Europe’s online fashion giant Zalando‘s research department (check the Fashion-MNIST GitHub repo and the Zalando reseach website). It contains 60,000 training and 10,000 test images of 10 different clothing categories (tops, pants, shoes etc.). Just like in MNIST, each image is a 28×28 grayscale image. It shares the same image size and structure of training and test images. Below are some examples from the dataset:

The makers of Fashion-MNIST argue, that nowadays the traditional MNIST dataset is a too simple task to solve – even simple convolutional neural networks achieve >99% accuracy on the test set whereas classical ML algorithms easily score >97%. For this and other reasons, Fashion-MNIST was created.

The Fashion-MNIST repo contains helper functions for loading the data as well as some scripts for benchmarking and testing your models. Also, there’s a neat visualization of an ebmedding of the data on the repo. After cloning, you can import the Fashion-MNIST data using a simple Python function (check the code in the next section) and start to build your model.

Using Google AutoML Vision

Preparing the data

AutoML offers two ways of data ingestion: (1) upload a zip file that contains the training images in different folders, corresponding to the respective labels or (2) upload a CSV file that contains the Goolge cloud storage (GS) filepaths, labels and optionally the data partition for training, validation and test set. I decided to go with the CSV file because you can define the data partition (flag names are TRAIN, VALIDATION and TEST) in order to keep control over the experiment. Below is the required structure of the CSV file that needs to be uploaded to AutoML Vision (without the header!).

partition file label
TRAIN gs://bucket-name/folder/image_0.jpg 0
TRAIN gs://bucket-name/folder/image_1.jpg 2
VALIDATION gs://bucket-name/folder/image_22201.jpg 7
VALIDATION gs://bucket-name/folder/image_22202.jpg 9
TEST gs://bucket-name/folder/image_69998.jpg 4
TEST gs://bucket-name/folder/image_69999.jpg 1

Just like MNIST, Fashion-MNIST data contains the pixel values of the respective images. To actually upload image files, I developed a short python script that takes care of the image creation, export and upload to GCP. The script iterates over each row of the Fashion-MNIST dataset, exports the image and uploads it into a Google Cloud storage bucket.

import os
import gzip
import numpy as np
import pandas as pd
from google.cloud import storage
from keras.preprocessing.image import array_to_img


def load_mnist(path, kind='train'):
    """Load MNIST data from `path`"""
    labels_path = os.path.join(path,
                               '%s-labels-idx1-ubyte.gz'
                               % kind)
    images_path = os.path.join(path,
                               '%s-images-idx3-ubyte.gz'
                               % kind)

    with gzip.open(labels_path, 'rb') as lbpath:
        labels = np.frombuffer(lbpath.read(), dtype=np.uint8,
                               offset=8)

    with gzip.open(images_path, 'rb') as imgpath:
        images = np.frombuffer(imgpath.read(), dtype=np.uint8,
                               offset=16).reshape(len(labels), 784)

    return images, labels


# Import training data
X_train, y_train = load_mnist(path='data', kind='train')
X_test, y_test = load_mnist(path='data', kind='t10k')

# Split validation data
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=10000)

# Dataset placeholder
files = pd.DataFrame({'part': np.concatenate([np.repeat('TRAIN', 50000),
                                              np.repeat('VALIDATION', 10000),
                                              np.repeat('TEST', 10000)]),
                      'file': np.repeat('file', 70000),
                      'label': np.repeat('label', 70000)})

# Stack training and test data into single arrays
X_data = np.vstack([X_train, X_valid, X_test])
y_data = np.concatenate([y_train, y_valid, y_test])

# GS path
gs_path = 'gs://secret/fashionmnist'

# Storgae client
storage_client = storage.Client.from_service_account_json(json_credentials_path='secret.json')
bucket = storage_client.get_bucket('secret-bucket')

# Fill matrix
for i, x in enumerate(X_data):
    # Console print
    if i % 1000 == 0:
        print('Uploading image {image}'.format(image=i))
    # Reshape and export image
    img = array_to_img(x=x.reshape(28, 28, 1))
    img.save(fp='fashionmnist' + '/' + 'image_' + str(i) + '.jpg')
    # Add info to data frame
    files.iloc[i, 1] = gs_path + '/' + 'image_' + str(i) + '.jpg'
    files.iloc[i, 2] = y_data[i]
    # Upload to GCP
    blob = bucket.blob('fashionmnist/' + 'image_' + str(i) + '.jpg')
    blob.upload_from_filename('fashionmnist/' + 'image_' + str(i) + '.jpg')
    # Delete image file
    os.remove('fashionmnist/' + 'image_' + str(i) + '.jpg')

# Export CSV file
files.to_csv(path_or_buf='fashionmnist.csv', header=False, index=False)

The function load_mnist is from the Fashion-MNIST repository and imports the training and test arrays into Python. After importing the training set, 10,000 examples are sampled and sotored as validation data using train_test_split from sklean.model_selection. The training, validation and test arrays are then stacked into X_data in order to have a single object for iteration. A placeholder DataFrame is initialized to store the required information (partition, filepath and label), required by AutoML Vision. storage from google.cloud connects to GCP using a service account json file (which I will, of course, not share here). Finally, the main process takes place, iterating over X_data, generating an image for each row, saving it to disk, uploading it to GCP and deleting the image since it is no longer needed. Lastly, I uploaded the exported CSV file into the Google Cloud storage bucket of the project.

Getting into AutoML

AutoML Vision is currently in Beta, which means that you have to apply before trying it out. Since me and my colleagues are currently exploring the usage of automated machine learning in a computer vision project for one of our customers, I already have access to AutoML Vision through the GCP console.

The start screen looks pretty unspectacular at this point. You can start by clicking on “Get started with AutoML” or read the documentation, which is pretty basic so far but informative, especially when you’re not familiar with basic machine learning concepts such as train-test-splits, overfitting, prcision / recall etc.

After you started, Google AutoML takes you to the dataset dialog, which is the first step on the road to the final AutoML model. So far, nothing to report here. Later, you will find here all of your imported datasets.

Generating the dataset

After hitting “+ NEW DATASET” AutoML takes you to the “Create dataset” dialog. As mentioned before, new datasets can be added using two different methods, shown in the next image.

I’ve already uploaded the images from my computer as well as the CSV file containing the GS filepaths, partition information as well as the corresponding labels into the GS bucket. In order to add the dataset to AutoML Vision you must specify the filepath to the CSV file that contains the image GS-filepaths etc.

In the “Create dataset” dialog, you can also enable multi-label classification, if you have multiple labels per image, which is also a very helpful feature. After hitting “CREATE DATASET”, AutoML iterates over the provided file names and builds the dataset for modeling. What exactly is does, is neither visible nor documented. This import process may take a while, so it is showing you the funky “Knight Rider” progress bar.

After the import is finished, you will recieve an email from GCP, informing you that the import of the dataset is completed. I find this helpful because you don’t have to keep the browser window open and stare at the progress bar all the time.

The email looks a bit weird, but hey, it’s still beta…

Training a model

Back to AutoML. The first thing you see after building your dataset are the imported images. In this example, the images are a bit pixelated because they are only 28×28 in resolution. You can navigate through the different labels using the nav bar on the left side and also manually add labels so far unlabeled images. Furthermore, you can request a human labeling service if you do not have any labels that come with your images. Additionally, you can create new labels if you need to add a category etc.

Now let’s get serious. After going to the “TRAIN” dialog, AutoML informs you about the frequency distribution of your labels. It recommends a minimum count of $n=100$ labels per class (which I find quite low). Also, it seems that it shows you the frequencies of the whole dataset (train, validation and test together). A grouped frquency plot by data partition would be more informative at this point, in my opinion.

A click on “start training” takes you to a popup window where you can define the model name and the allocate a training budget (computing time / money) you are willing to invest. You have the choice between “1 compute hour”, whis is free for 10 models every month, or “24 compute hours (higher quality)” that comes with a price tag of approx. 480 $ (1 hour of AutoML computing costs 20 $. Hovever, if the architecture search converges at an earlier point, you will only pay the amount of computing time that has been consumed so far, which I find reasonable and fair. Lastly, there is also the option to define a custom training time, e.g. 5 hours.

In this experiment, I tried both, the “free” version of AutoML but I also went “all-in” and seleced the 24 hours option to achieve the best model possible (“paid model”). Let’s see, what you can expect from a 480 $ cutting edge AutoML solution. After hitting “START TRAINING” the familiar Knight Rider screen appears telling you, that you can close the browser window and let AutoML do the rest. Naise.

Results and evaluation

First, let’s start with the free model. It took approx. 30mins of training and seemed to have converged a solution very quickly. I am not sure, what exactly AutoML does when it evaluates convergence criteria but it seems to be different between the free and paid model, because the free model converged already around 30 minutes of computing and the paid model did not.

The overall model metrics of the free model look pretty decent. An average precision of 96.4% on the testset at a macro class 1 presision of 90.9% and a recall of 87.7%. The current accuracy benchmark on the Fashion-MNIST dataset is at 96.7% (WRN40-4 8.9M parameters) followed by 96.3% (WRN-28-10 + Random Erasing) while the accuracy of the low budget model is only at 89.0%. So the free AutoML model is pretty far away from the current Fashion-MNIST benchmark. Below, you’ll find the screenshot of the free model’s metrics.

The model metrics of the paid model look significantly better. It achieved an average precision of 98.5% on the testset at a macro class 1 presision of 95.0% and a recall of 92.8% as well as an accuracy score of 93.9%. Those results are close to the current benchmark, however, not so close as I hoped. Below, you’ll find the screenshot of the paid model’s metrics.

The “EVALUATE” tab also shows you further detailed metrics such as precision / recall curves as well as sliders for classification cutoffs that impact the model metrics respectively. At the bottom of the page you’ll find the confusion matrix with relative freuqencies of correct and misclassified examples. Furthermore, you can check images of false positives and negatives per class (which is very helpful, if you want to understand why and when your model is doing something wrong). Overall, the model evaluation functionalities are limited but user friendly. As a more profound user, of course, I would like to see more advanced features but considering the target group and the status of development I think it is pretty decent.

Prediction

After fitting and evaluating your model you can use several methods to predict new images. First, you can use the AutoML user interface to upload new images from your local machine. This is a great way for unexperienced users to apply their model to new images and get predictions. For advanced users and developers, AutoML vision exposes the model through an API on the GCP while taking care of all the technical infrastructure in the background. A simple Python script shows the basic usage of the API:

import sys
from google.cloud import automl_v1beta1


# Define client from service account json
client = automl_v1beta1.PredictionServiceClient.from_service_account_json(filename='automl-XXXXXX-d3d066fe6f8c.json')

# Endpoint
name = 'projects/automl-XXXXXX/locations/us-central1/models/ICNXXXXXX

# Import a single image
with open('image_10.jpg', 'rb') as ff:
    img = ff.read()

# Define payload
payload = {'image': {'image_bytes': img}}

# Prediction
request = client.predict(name=name, payload=payload, params={})
print(request)

# Console output
payload {
  classification {
    score: 0.9356002807617188
  }
  display_name: "a_0"
}

As a third method, it is also possible to curl the API in the command line, if you want to go full nerdcore. I think, the automated API exposure is a great feature because it lets you integrate your model in all kinds of scripts and applications. Furthermore, Google takes care of all the nitty-gritty things that come into play when you want to scale the model to hundrets or thousands of API requests simultaneously in a production environment.

Conclusion and outlook

In a nutshell, even the free model achieved pretty good results on the test set, given that the actual amount of time invested in the model was only a fraction of time it would have taken to build the model manually. The paid model achieved significantly better results, however at a cost note of 480 $. Obviously, the paid service is targeted at data science professionals and companies.

AutoML Vision is only a part of a set of new AutoML applications that come to the Google Cloud (check these announcements from Google Next 18), further shaping the positioning of the platform in the direction of machine learning and AI.

In my personal opinion, I am confident that automated machine learning solutions will continue to make their way into professional data science projects and applications. With automated machine learning, you can (1) build baseline models for benchmarking your custom solutions, (2) iterate use cases and data products faster and (3) get quicker to the point, when you actually start to make money with your data – in production.

A framework to automate your work – How to set up Airflow

In the first part of this blog post, we talked about what a DAG is, how to apply this mathematical concept in project planning and programming and why we at STATWORX decided to use Airflow compared to other workflow managers. In this part, however, we will get more technical and investigate a quite informative hello-world programming and how to set up Airflow for different setups one could face. If you are just interested in the technical part and therefore do not want to read the first one, but still want a recap- here is a summary:

  • DAG is short for directed acyclic graph and as such can represent relationships and dependencies
  • This last aspect can be used in project management as it can become clear what task can run independently of each other and which can not
  • The same properties can be used in programming, as software can determine which jobs can run concurrently or in which order the others have to finish (or fail).

Why did we choose Airflow:

  1. No Cron – With Airflow´s included scheduler we don’t need to rely on cron to schedule our DAGs and only use one framework (not like Luigi)
  2. Code Bases – In Airflow all the workflows, dependencies, and scheduling are done in Python code. Therefore, it is rather easy to build complex structures and extend the flows.
  3. Language – Python is a language somewhat natural to pick up, and that skill was already available in our team.

Preparation

The first thing was to set up a new virtual environment with Python and virtualenv.

$pip install virtualenv # if it hasn't been installed yet   
$cd  # change into home 

# create a separated folder with all environments  
$mkdir env   
$cd env   
$virtualenv airflow

Once it has been created, we can source this environment whenever we want to work with Airflow, so we don’t get into conflict with other dependencies.

$source ~/env/airflow/bin/activate  

Then we can install all Python packages we will need

$ pip install -U pip setuptools wheel \
psycopg2\
Cython \
pytz \
pyOpenSSL \
ndg-httpsclient \
pyasn1 \
psutil \
apache-airflow[postgres]\  

A Small Breeze

Once our setup is done, we can check if Airflow is correctly installed by typing airflow version into the bash and you should see something like this:

version-sequential

Initially, Airflow will run with a SQLite database, which cannot execute more than one task at a time and, therefore, should be switched out once you want or get serious about it. However, more on this later. Next, let us start with the typical hello world example. Navigate to your AIRFLOW_HOME-path which is by default a folder called airflow in your root directory. If you want to change this, edit the environment variable with export AIRFLOW_HOME=/your/new/path and call airflow version once again.

# ~/airflow/dags/HelloWorld.py  

from airflow import DAG  
from airflow.operators.dummy_operator import DummyOperator  
from airflow.operators.python_operator import PythonOperator  
from datetime import datetime, timedelta  

def print_hello():
    return 'Hello world!'  

dag = DAG('hello_world',
            description='Simple tutorial DAG',
            start_date= datetime.now() - timedelta(days= 4),  
            schedule_interval= '0 12 * * *'  
         )  

dummy_operator= DummyOperator(task_id= 'dummy_task', retries= 3, dag= dag)  

hello_operator= PythonOperator(task_id= 'hello_task', python_callable= print_hello, dag= dag)  

dummy_operator >> hello_operator # same as  dummy_operator.set_downstream(hello_operator)  

The first nine lines should be somewhat self-explanatory, just the import of the necessary libraries and the definition of the ‘hello world’-function. The interesting part starts at line ten. Here, we are going to define the core of our workflow, a DAG-Object with the identifier hello_world in this case and a small description what this workflow will be used for and what it does (line 10). As you might have suspected the argument, start_date defines the beginning date of the task. This date should always be in the past. Otherwise, the task would be triggered and always keep asking if it can run and as such it will remain active until it is planned. The schedule_interval define the periods when the graph should be executed. We either set with a cron-like notation as above or with some syntactical sugar Airflow offers. In the example above, we define that the task should run daily at 12:00 o’clock. The fact that it should run daily could have also been expressed with schedule_interval='@daily. The cron notation follows the schema minute - hour - day (of the month) - month - day (of the week), like mi h d m wd. With the usage of * as a wildcard, we gain the ability to schedule in very flexible intervals. Let’s say we want a job to run every first day of the month at twelve o’clock. In this case, we wouldn’t want a specific month nor a specific day of the week and as such replace the placeholder by wildcard * ( min h d * *). Since it should run at 12:00 we will replace mi with 0 and h with 12. Last but not least, we would plug in the day of the month as 1 and get our final cron-notation 0 12 1 * *. If we don’t want to be this specific but rather say run daily or hourly starting from the start_date, we could use Airflows sugar – @daily, @hourly, @monthly or @yeary.

Once we have this DAG-instance, we can start to fill it with some task. Instances of operators in Airflow represent these. Here, we initiate a DummyOperator and a PythonOperator. Both need to be assigned a unique id, but this time it only needs to be unique within the workflow. As the first operator, we define a DummyOperator which does nothing at all. We only want it to fill our graph and that we could test Airflow with a scenario as simple as possible. The second one is a PythonOperator. Additional to the assignment to a graph and the id, the operator requires a function which will be executed once the task is triggered. Now, we can use our hello_world function and attach it through the PythonOperator to our workflow.

Before we can finally execute our flow, we still need to set the relations between our tasks. This linking is done either with the binary operators << and >> or by calling the set_upstream and set_downstream methods, respectively. This way, we can set the dependency that first the DummyOperator needs to run and succeed before our PythonOperator is executed.

Now that our code is fine we should test it. Therefore, we should run it directly in the Python interpreter to check whether we have a syntax mistake. So, either run it in an IDE or the terminal with the command python hello_world.py. If the interpreter doesn’t throw an error – congratulation you didn’t screw up too bad. Next, we need to check if Airflow is aware of our DAG with airflow list_dags. Now, we should see our hello_world id in the printed list. If so, we can check whether each task is assigned to it with airflow list_task hello_world. Again, we should see some familiar ids namely dummy_task and hello_task. So far so good, seems like at least the assignment worked. Next up is a unit test of the individual operators with airflow test dummy_task 2018-01-01 and airflow test hello_task 2018-01-01. Hopefully, this doesn’t raise any error, and we can continue.

Now that we could deploy our example workflow we first need to initiate Airflow entirely. This requires three commands before we can continue to trigger our task manually.

  1. airflow initdb to initiate the database where Airflow saves the workflows and their states:
    initdb-sequential
  2. airflow webserver to start the web server at localhost:8080 where we can reach the web interface:
    webserver-sequential
  3. airflow scheduler to start the scheduling process of the DAGs such that the individual workflows can be triggered:scheduler-sequential
  4. airflow trigger_dag hello_world to trigger our workflow and place it on the schedule.

Now we can either open a web browser and navigate to the respective website or call open http://localhost:8080/admin/ in the terminal, and it should lead us to a web page like this.

web-ui

At the bottom, you should see your creation and the light green circle indicates that our flow is scheduled and running. The only thing left for us to do is to wait until it is executed. In the meantime, we can talk about the setup and how we could use some of the other executors.

The Backend

As mentioned before – once we want to get serious about the execution of our graphs we need to change the backend of Airflow. Initially, it will use a simple SQLite database which will limit Airflow only to execute one task at a time sequentially. Therefore, we will first change the connected database to PostgreSQL. In case you haven’t installed Postgres yet and need help with it I’d recommend you to check out its (wiki)[https://wiki.postgresql.org/wiki/Detailed_installation_guides]. I couldn’t possibly describe the process as good as the page. For those who are on a Linux-based system (sorry Windows), try sudo apt-get install postgresql-client or by using homebrew on a Mac – brew install postgresql. Another easy way would be to use a Docker container with the respective image. Now, let us create a new database for Airflow by typing into to the terminal psql createdb airflow where all the metadata will be saved. Next, we need to edit the airflow.cfg file which should have appeared in your AIRFLOW_HOME folder (which is again by default airflow in your home directory) and restart the steps 1-4 from above (initdb…). Now, fire up your favorite editor and look for line 32 sql_alchemy_conn =. Here we are going to replace the SQLite connection string with the one from our PostgreSQL-Server and a new driver. This string will be made up out of postgresql+psycopg2://IPADRESS:PORT/DBNAME?user=USERNAME&password=PASSWORD

The first part tells sqlalchemy that the connection will lead to PostgreSQL and that it should use the psycopg2 driver to connect to it. In case you have Postgres locally installed (or in a container which maps to localhost) and did not change the standard port of 5432 our IPADRESS:PORT could be translated to localhost:5432 or simply localhost. The DBNAME would be changed to airflow in our case as we just created that for this purpose. The last two parts depend on what you have chosen as security measurements. Finally, we might have gotten a line that looks like this.

sql_alchemy_conn = postgresql+psycopg2://localhost/airflow?user=postgres&password=password  

Once we have done this, we can also change our executor in line 27 from executor = SequentialExecutor to an executor = LocalExecutor. That way every task will be started as a subprocess and as such the parallelizing will happen locally. This approach works great as long as our jobs are not too complicated or should run on multiple machines.

Once we reached that point, we need Celery as Executor. It is an asynchronous task/job queue based on distributed message passing. However, to use the CeleryExecutor, we need one more piece of software – a message broker. A message broker is an intermediary program module that translates a message from the ‘language’ of the sender to one for the receiver. The two most common options are either (redis)[https://redis.io/] or (rabbitmq)[https://www.rabbitmq.com/]. Use whatever you feel most comfortable with. Since we used rabbitmq, the whole process will continue with this broker but should be more or less analog for redis.

Again, for Linux and Mac users with homebrew, it is a one-liner to install it. Merely tip into your terminal sudo apt-get install rabbitmq-server or brew install rabbitmq and done. Next, we need a new user with a password and a virtual host. Both – user and host – can be created in the terminal by utilizing rabbitsmqs command line tool rabbitmqctl. Let’s say we want to create a new user called myuser with mypassword and a virtual host as myvhost. This can be achieved as follows:

$ rabbitmqctl add_user myuser mypassword  
$ rabbitmqctl add_vhost myvhost  

However, now back to the Airflows configuration. Navigate your editor to the line 230, and you will hopefully see broker_url =. This connection-string is similar to the one for the database and build with the pattern BROKER://USER:PASSWORD@IP:PORT/HOST. Our broker has the acronym amqp, and we can plug in our newly created user, password, and host. Unless you have changed the port or use a remote server your line should look something like this.

broker_url = amqp://myuser:mypassword@localhost:5672/myvhost  

Next, we need to grant Celery access to our airflow database and fill line 232 with db+postgresql://localhost:5432/airflow?user=postgres&password=password.

This string should be mostly equivalent to the one we used before. We only need to drop the driverpsycopg2 and instead add db+ at the beginning. And that’s it! All of the three executors should now be in your hand, and the setups are complete. Independent of the executor you have chosen, once you change the configuration you need to restart the steps 1-4 – initializing the DB, restart the scheduler and the web server. If you do this now, you will realize that the prompt changed slightly as it will show which executer you are using.

webserver-celery

Conclusion

Airflow is an easy-to-use code-based workflow manager with an integrated scheduler and multiple executors to scale as needed. If you want to execute a flow sequentially or if there is nothing that could run concurrently, the default SQLite database and sequential executor should do the job. If you’re going to use Airflow to start multiple tasks at the same time and thus keep track of the dependencies you should first change the database and to a LocalExecutor for local multiprocessing. Moreover, due to Celery, we are even able to use multiple machines to execute even more advanced and complex workflows without much effort and worries.

A framework to automate your work – How to set up Airflow

In the first part of this blog post, we talked about what a DAG is, how to apply this mathematical concept in project planning and programming and why we at STATWORX decided to use Airflow compared to other workflow managers. In this part, however, we will get more technical and investigate a quite informative hello-world programming and how to set up Airflow for different setups one could face. If you are just interested in the technical part and therefore do not want to read the first one, but still want a recap- here is a summary:

Why did we choose Airflow:

  1. No Cron – With Airflow´s included scheduler we don’t need to rely on cron to schedule our DAGs and only use one framework (not like Luigi)
  2. Code Bases – In Airflow all the workflows, dependencies, and scheduling are done in Python code. Therefore, it is rather easy to build complex structures and extend the flows.
  3. Language – Python is a language somewhat natural to pick up, and that skill was already available in our team.

Preparation

The first thing was to set up a new virtual environment with Python and virtualenv.

$pip install virtualenv # if it hasn't been installed yet   
$cd  # change into home 

# create a separated folder with all environments  
$mkdir env   
$cd env   
$virtualenv airflow

Once it has been created, we can source this environment whenever we want to work with Airflow, so we don’t get into conflict with other dependencies.

$source ~/env/airflow/bin/activate  

Then we can install all Python packages we will need

$ pip install -U pip setuptools wheel \
psycopg2\
Cython \
pytz \
pyOpenSSL \
ndg-httpsclient \
pyasn1 \
psutil \
apache-airflow[postgres]\  

A Small Breeze

Once our setup is done, we can check if Airflow is correctly installed by typing airflow version into the bash and you should see something like this:

version-sequential

Initially, Airflow will run with a SQLite database, which cannot execute more than one task at a time and, therefore, should be switched out once you want or get serious about it. However, more on this later. Next, let us start with the typical hello world example. Navigate to your AIRFLOW_HOME-path which is by default a folder called airflow in your root directory. If you want to change this, edit the environment variable with export AIRFLOW_HOME=/your/new/path and call airflow version once again.

# ~/airflow/dags/HelloWorld.py  

from airflow import DAG  
from airflow.operators.dummy_operator import DummyOperator  
from airflow.operators.python_operator import PythonOperator  
from datetime import datetime, timedelta  

def print_hello():
    return 'Hello world!'  

dag = DAG('hello_world',
            description='Simple tutorial DAG',
            start_date= datetime.now() - timedelta(days= 4),  
            schedule_interval= '0 12 * * *'  
         )  

dummy_operator= DummyOperator(task_id= 'dummy_task', retries= 3, dag= dag)  

hello_operator= PythonOperator(task_id= 'hello_task', python_callable= print_hello, dag= dag)  

dummy_operator >> hello_operator # same as  dummy_operator.set_downstream(hello_operator)  

The first nine lines should be somewhat self-explanatory, just the import of the necessary libraries and the definition of the ‘hello world’-function. The interesting part starts at line ten. Here, we are going to define the core of our workflow, a DAG-Object with the identifier hello_world in this case and a small description what this workflow will be used for and what it does (line 10). As you might have suspected the argument, start_date defines the beginning date of the task. This date should always be in the past. Otherwise, the task would be triggered and always keep asking if it can run and as such it will remain active until it is planned. The schedule_interval define the periods when the graph should be executed. We either set with a cron-like notation as above or with some syntactical sugar Airflow offers. In the example above, we define that the task should run daily at 12:00 o’clock. The fact that it should run daily could have also been expressed with schedule_interval='@daily. The cron notation follows the schema minute - hour - day (of the month) - month - day (of the week), like mi h d m wd. With the usage of * as a wildcard, we gain the ability to schedule in very flexible intervals. Let’s say we want a job to run every first day of the month at twelve o’clock. In this case, we wouldn’t want a specific month nor a specific day of the week and as such replace the placeholder by wildcard * ( min h d * *). Since it should run at 12:00 we will replace mi with 0 and h with 12. Last but not least, we would plug in the day of the month as 1 and get our final cron-notation 0 12 1 * *. If we don’t want to be this specific but rather say run daily or hourly starting from the start_date, we could use Airflows sugar – @daily, @hourly, @monthly or @yeary.

Once we have this DAG-instance, we can start to fill it with some task. Instances of operators in Airflow represent these. Here, we initiate a DummyOperator and a PythonOperator. Both need to be assigned a unique id, but this time it only needs to be unique within the workflow. As the first operator, we define a DummyOperator which does nothing at all. We only want it to fill our graph and that we could test Airflow with a scenario as simple as possible. The second one is a PythonOperator. Additional to the assignment to a graph and the id, the operator requires a function which will be executed once the task is triggered. Now, we can use our hello_world function and attach it through the PythonOperator to our workflow.

Before we can finally execute our flow, we still need to set the relations between our tasks. This linking is done either with the binary operators << and >> or by calling the set_upstream and set_downstream methods, respectively. This way, we can set the dependency that first the DummyOperator needs to run and succeed before our PythonOperator is executed.

Now that our code is fine we should test it. Therefore, we should run it directly in the Python interpreter to check whether we have a syntax mistake. So, either run it in an IDE or the terminal with the command python hello_world.py. If the interpreter doesn’t throw an error – congratulation you didn’t screw up too bad. Next, we need to check if Airflow is aware of our DAG with airflow list_dags. Now, we should see our hello_world id in the printed list. If so, we can check whether each task is assigned to it with airflow list_task hello_world. Again, we should see some familiar ids namely dummy_task and hello_task. So far so good, seems like at least the assignment worked. Next up is a unit test of the individual operators with airflow test dummy_task 2018-01-01 and airflow test hello_task 2018-01-01. Hopefully, this doesn’t raise any error, and we can continue.

Now that we could deploy our example workflow we first need to initiate Airflow entirely. This requires three commands before we can continue to trigger our task manually.

  1. airflow initdb to initiate the database where Airflow saves the workflows and their states:
    initdb-sequential
  2. airflow webserver to start the web server at localhost:8080 where we can reach the web interface:
    webserver-sequential
  3. airflow scheduler to start the scheduling process of the DAGs such that the individual workflows can be triggered:scheduler-sequential
  4. airflow trigger_dag hello_world to trigger our workflow and place it on the schedule.

Now we can either open a web browser and navigate to the respective website or call open http://localhost:8080/admin/ in the terminal, and it should lead us to a web page like this.

web-ui

At the bottom, you should see your creation and the light green circle indicates that our flow is scheduled and running. The only thing left for us to do is to wait until it is executed. In the meantime, we can talk about the setup and how we could use some of the other executors.

The Backend

As mentioned before – once we want to get serious about the execution of our graphs we need to change the backend of Airflow. Initially, it will use a simple SQLite database which will limit Airflow only to execute one task at a time sequentially. Therefore, we will first change the connected database to PostgreSQL. In case you haven’t installed Postgres yet and need help with it I’d recommend you to check out its (wiki)[https://wiki.postgresql.org/wiki/Detailed_installation_guides]. I couldn’t possibly describe the process as good as the page. For those who are on a Linux-based system (sorry Windows), try sudo apt-get install postgresql-client or by using homebrew on a Mac – brew install postgresql. Another easy way would be to use a Docker container with the respective image. Now, let us create a new database for Airflow by typing into to the terminal psql createdb airflow where all the metadata will be saved. Next, we need to edit the airflow.cfg file which should have appeared in your AIRFLOW_HOME folder (which is again by default airflow in your home directory) and restart the steps 1-4 from above (initdb…). Now, fire up your favorite editor and look for line 32 sql_alchemy_conn =. Here we are going to replace the SQLite connection string with the one from our PostgreSQL-Server and a new driver. This string will be made up out of postgresql+psycopg2://IPADRESS:PORT/DBNAME?user=USERNAME&password=PASSWORD

The first part tells sqlalchemy that the connection will lead to PostgreSQL and that it should use the psycopg2 driver to connect to it. In case you have Postgres locally installed (or in a container which maps to localhost) and did not change the standard port of 5432 our IPADRESS:PORT could be translated to localhost:5432 or simply localhost. The DBNAME would be changed to airflow in our case as we just created that for this purpose. The last two parts depend on what you have chosen as security measurements. Finally, we might have gotten a line that looks like this.

sql_alchemy_conn = postgresql+psycopg2://localhost/airflow?user=postgres&password=password  

Once we have done this, we can also change our executor in line 27 from executor = SequentialExecutor to an executor = LocalExecutor. That way every task will be started as a subprocess and as such the parallelizing will happen locally. This approach works great as long as our jobs are not too complicated or should run on multiple machines.

Once we reached that point, we need Celery as Executor. It is an asynchronous task/job queue based on distributed message passing. However, to use the CeleryExecutor, we need one more piece of software – a message broker. A message broker is an intermediary program module that translates a message from the ‘language’ of the sender to one for the receiver. The two most common options are either (redis)[https://redis.io/] or (rabbitmq)[https://www.rabbitmq.com/]. Use whatever you feel most comfortable with. Since we used rabbitmq, the whole process will continue with this broker but should be more or less analog for redis.

Again, for Linux and Mac users with homebrew, it is a one-liner to install it. Merely tip into your terminal sudo apt-get install rabbitmq-server or brew install rabbitmq and done. Next, we need a new user with a password and a virtual host. Both – user and host – can be created in the terminal by utilizing rabbitsmqs command line tool rabbitmqctl. Let’s say we want to create a new user called myuser with mypassword and a virtual host as myvhost. This can be achieved as follows:

$ rabbitmqctl add_user myuser mypassword  
$ rabbitmqctl add_vhost myvhost  

However, now back to the Airflows configuration. Navigate your editor to the line 230, and you will hopefully see broker_url =. This connection-string is similar to the one for the database and build with the pattern BROKER://USER:PASSWORD@IP:PORT/HOST. Our broker has the acronym amqp, and we can plug in our newly created user, password, and host. Unless you have changed the port or use a remote server your line should look something like this.

broker_url = amqp://myuser:mypassword@localhost:5672/myvhost  

Next, we need to grant Celery access to our airflow database and fill line 232 with db+postgresql://localhost:5432/airflow?user=postgres&password=password.

This string should be mostly equivalent to the one we used before. We only need to drop the driverpsycopg2 and instead add db+ at the beginning. And that’s it! All of the three executors should now be in your hand, and the setups are complete. Independent of the executor you have chosen, once you change the configuration you need to restart the steps 1-4 – initializing the DB, restart the scheduler and the web server. If you do this now, you will realize that the prompt changed slightly as it will show which executer you are using.

webserver-celery

Conclusion

Airflow is an easy-to-use code-based workflow manager with an integrated scheduler and multiple executors to scale as needed. If you want to execute a flow sequentially or if there is nothing that could run concurrently, the default SQLite database and sequential executor should do the job. If you’re going to use Airflow to start multiple tasks at the same time and thus keep track of the dependencies you should first change the database and to a LocalExecutor for local multiprocessing. Moreover, due to Celery, we are even able to use multiple machines to execute even more advanced and complex workflows without much effort and worries.