Introduction
In my previous articles, we have talked about how to set up a Cloud Run application with for MLFlow Server and Kubflow on GKE with Ingress, load balancer, IAP set up for only allowing traffic from specific IP addresses. In this article, I will talk about how to deploy ML endpoints to serve predictions as a Cloud Run applications with Ingress control to communicate with each other, also how to deploy a Vertex AI serving endpoint and a kubeflow KServe endpoint.
Architecture
Here is an overview of the MLOPS workflow.
And in today’s post, we will focus on this part.
To follow along this blog post, the repo is here
Learnings
In today’s post, we will be
1 - Set up a VPC connector for the Cloud Run application
2 - Create a FastAPI application that uses MLFlow model to make predictions
3 - Deploy Cloud Run applications with appropriate configuration
4 - Make a request to the FastAPI application to make predictions
5 - Set up a Vertex AI endpoint (optional)
6 - Set up a Kubeflow KServe endpoint (optional)
Set Up VPC connector
The reason for setting up Cloud Run applications with Ingress control to communicate with each other is that after setting up MLFlow with Cloud Run application, we will be building an API endpoint that uses model from MLFlow to make predictions. We will deploy the API endpoint with Cloud Run as the second application, thus the two applications need to communicate with each other. With the MLFlow application set up with Ingress, if we set up the API endpoint without any network control, it will not be able to reach the MLFlow API to retrieve the model for making predictions. Therefore, we will set up a VPC connector for the API endpoint to communicate with the MLFlow application.
A high level application structure looks something like this
To create a VPC connector, on GCP console, we can navigate to VPC network
-> Serverless VPC access
-> Create connector
, fill in the necessary details and create the connector.
To check the available locations for VPC connector, we can run the following command
Create FastAPI application
Folder strcutre for the FastAPI application looks like this
Our Dockerfile looks like this, please note that the python version that we are using for the API endpoint needs to be the same as the environment where the prediction API will be used. For example, if you will be calling the API from a Jupiter notebook, the python version of the Jupiter notebook should also e 3.11
Our pyproject.toml looks like this. Same as Dockerfile, apart from the python version, the scikit-learn and xgboost version should be the same as the environment where the prediction API will be used. Could be a Jupiter notebook, a local python virtual env, etc.
For the API endpoint, we will want to set up a mechanism to authenticate the request to the API endpoint. We can use HTTPBasic for this. The validate
function will be used to validate the credentials. The validate
function will be used as a dependency for the API endpoint.
And in our main.py
, we can set up the API endpoint like this. Which ensures that the request to the API endpoint is authenticated.
Deploy Cloud Run application with VPC connector
Deploy FastAPI application to Cloud Run using skaffold
The skaffold.yaml
looks like this
We can see the skaffold configuration file is set up to reference deploy/production/service.yaml
.
The service.yaml
file looks like this
We can see we have set the run.googleapis.com/vpc-access-egress
and run.googleapis.com/vpc-access-connector
annotations to allow the Cloud Run application to use the VPC connector we have created.
And finally, to deploy, we can run the command
Redeploy MLFlow application to use VPC
Update the deploy/production/service.yaml
to add the below lines under spec -> template -> metadata -> annotations
The complete yaml file looks like this
To deploy
Make a request to the FastAPI application to make predictions
And then if we try to request the FastAPI application to make predictions, without an Authentication header, we will get an error like this
Set Up Vertex AI endpoint (optional)
We already slightly discussed about this on the previous post, to use kubeflow to deploy a Vertex AI servin endpoint, we can execute the below.
Set Up KServe endpoint (optional)
To deploy the endpoint to Kserve
Thank you for reading and have a nice day!