IEEE AI Coalition: Efficient and Scalable AI Inference: Navigating the Challenges of Model Deployment at Scale
In machine learning, model deployment strategies are crucial for managing high scale infrastructure where the goal is to achieve efficient, scalable, and cost effective inference. This session will cover the challenges involved in deploying models, both small and large, in heterogeneous environments where models use varying amounts of resources like GPUs. The session will explore the complexities of orchestrating such a system, emphasizing that efficient GPU usage is a priority to prevent idling and wasted compute, while also serving inference requests quickly. The session will delve into suitable design approaches to navigate these complexities and aims to equip attendees with the knowledge to design their infrastructure for deploying and managing models effectively.
Presented by Bhala Ranganathan, Microsoft
Cart
Create Account
Sign In