Optimizing ML Models in Production in the Cloud or at the Edge using Shadow Deployments

Christopher_Tearpak · ‎Feb 06 2024

In this blog post, guest blogger Martin Bald, Senior Manager Developer Community from one of our startup partners Wallaroo.AI will continue to address model validation through the method of Shadow Deployments. Testing and experimentation are a critical part of the ML lifecycle. This is because the ability to quickly experiment and test new models in the real world helps data scientists to continually learn, innovate, and improve AI-driven decision processes.

Introduction

In the first blog post in this series, we stepped through deploying a packaged ML model to an edge device in a retail CV use case example. However, in the MLOps lifecycle the act of pushing a model into production is not the only consideration we need to make. In many situations, it's important to vet a model's performance in the real world before fully activating it. Real world vetting can surface issues that may not have arisen during the development stage, when models are only checked using hold-out data.

Shadow Deployments

Another way to vet your new model is to set it up in a shadow deployment. This is a good way to vet a new object detection model, because you can directly compare it to what the champion model does: does it detect the same objects, and identify them the same way? With shadow deployments, all the models in the experiment pipeline get all the data, and all inferences are logged as seen in the image below. However, the pipeline only outputs the inferences from one model–the default, or champion model.

Shadow deployments are useful for "sanity checking" a model before it goes truly live. For example, you might have built a smaller, leaner version of an existing model using knowledge distillation or other model optimization techniques, as discussed here. A shadow deployment of the new model alongside the original model can help ensure that the new model meets desired accuracy and performance requirements before it's put into production.

Let’s go through the steps for a Shadow Deployment so we can see how it works in our retail CV scenario.

First we will upload both models in the same way as we did for the initial deployment.

control = wl.upload_model('mobilenet', "models/mobilenet.pt.onnx", framework=Framework.ONNX).configure(tensor_fields=['tensor'],batch_config="single")
challenger = wl.upload_model('resnet50', "models/frcnn-resnet.pt.onnx", framework=Framework.ONNX).configure(tensor_fields=['tensor'],batch_config="single")

Next we will specify the deployment configuration. Note that we have increased the memory configuration from our initial deployment configuration from 1Gi to 3Gi as we are running data to multiple models at the same time and require more memory for this task.

deployment_config = wallaroo.DeploymentConfigBuilder() \
    .replica_count(1) \
    .cpus(1) \
    .memory("3Gi") \
    .build()

Instead of adding a model step as we did last time, we add a shadow deployment step, specifying the challenger model(s) to run in the background:

pipeline = wl.build_pipeline("cam-ny-1-shd") \
        .add_shadow_deploy(control, [challenger]) \
        .deploy(deployment_config = deployment_config)

Our output shows that deployment was successful.

Now that this is deployed, we will run inference as normal to test:

startTime = time.time()
infResults = pipeline.infer(dfImage)
endTime = time.time()

Champion and Challenger Results

We can plot the output results in the same way we did for our model deployment to show both models overlaid on the product image.

elapsed = 1.0
results = {
    'model_name' : challenger.name(),
    'pipeline_name' : pipeline.name(),
    'width': width,
    'height': height,
    'image' : resizedImage,
    'inf-results' : infResults,
    'confidence-target' : 0.40,
    'inference-time': (endTime-startTime),
    'onnx-time' : int(elapsed) / 1e+9,
    'classes_file': "models/coco_classes.pickle",                 
    'color': utils.mapColors('AMBER')
}
image = utils.drawShadowDetectedObjectsFromInference(results, challenger.name())

Hot-Swap in Challenger Model

Based on the output results we can see that the confidence factor for the challenger model is higher than the champion model which indicates that the challenger model is more confident in its’ prediction so we will hot swap the challenger into production as the new champion model. We can do this with a couple of lines of code and without the need to undeploy or stop the production environment.

pipeline = pipeline.clear().add_model_step(challenger)
status = pipeline.deploy()

Conclusion

In this blog we have seen one kind of production ML model validation methods in action: Shadow Deployments. With testing and experimentation being a critical part of the ML lifecycle, we have the ability to quickly experiment and test new models in production and make fast informed decisions to replace the production champion model on the fly when a new model shows better accuracy without halting production. This helps data scientists to continually learn, innovate, and improve AI-driven decision processes.

The next blog post in this series will cover model observability for edge or multi-cloud production deployments.

If you want to try the steps in these blog posts you can access the tutorials at this link and use the free inference servers available on the Azure Marketplace. Or you can download a free Wallaroo.AI Community Edition you can use with GitHub Codespaces.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs