While models can generate remarkable predictions and valuable outputs, their true worth lies in their accessibility to end-users. Manual deployment processes are often cumbersome and costly, hindering the efficient delivery of models for end-users.
Automated deployments offer a promising alternative that ensures rapid delivery of updated models and numerous other benefits. Efficient deployment mechanisms must prioritize resource utilization, speed, and scalability to ensure robust and enriching user experiences. Techniques like pipelines, containerization, microservice architecture and auto-scaling infrastructure are some of the explorable options in optimizing deployments. In this workshop, we explore an industry-centric perspective on model deployment and system architectures, drawing from minoHealth AI's deployment practices, to showcase the efficiency gains achievable by doing it right. We also provide insights into optimizing inference and generation processes and scalability considerations for deployments.
Intermediate level
No prerequisites