2024-08-21 –, Scotiabank Lecture Hall
As businesses are gaining adoption of Large Language Models (LLMs), deploying these systems in production gives rise to a unique set of challenges. In this talk we will dive into the details of LLM deployment, highlighting key issues such as scalability, fine-tuning for specific tasks, and resource utilization. We will also explore LLMOps, a strategic approach for managing the lifecycle of LLMs (LLMOps is MLOps but for LLMs) to ensure efficient development, deployment, and maintenance of models in production. We will also be comparing it with traditional CI/CD practices to have a easy understanding on how to effectively manage LLM operations. The talk will conclude with general tips and takeaways for businesses to keep in mind when starting with LLMs and key concerns to be mindful of when optimizing LLM performance in their applications.
Aim: The talk is intended to serve as a guide to understand the challenges in production and apply actionable strategies to effectively deploy and manage LLMs in their business.
In this talk/presentation: I will provide an in-depth look at the world of Large Language Models (LLMs) and their deployment challenges. We will start with an introduction to LLMs, in brief discussing about current popular models from top competitors like OpenAI’s GPT, Microsoft’s Co-pilot, Cohere, and LLaMA, highlighting their distinguishing features and relevance across various sectors.
We will then explore the major challenges in deploying LLMs in production, focusing on challenges in scalability, fine-tuning, and resource consumption. In this we will also see why businesses should be mindful of these challenges, the visible impact of these issues, and the importance of efficient resource management for LLM applications. The importance of domain-specific fine-tuning and the potential pitfalls will also be discussed.
Next, I will introduce LLMOps, a framework designed to streamline the lifecycle management of LLMs. We will break down the architecture and components of LLMOps, including data orchestration, tuning pipelines, and deployment workflows. In brief, I will also compare LLMOps with traditional CI/CD processes.
Before closing, we will discuss key concerns and strategies for monitoring and evaluating LLM performance, including prompt engineering. The talk will conclude with practical advice for getting started with LLMs, understanding costs, and leveraging human feedback.
Audience Target: I have targeted the talk to DevOps professionals, other tech and non-tech professionals, and students.
I am a passionate data and technology enthusiast currently pursuing Masters in Applied Computer Science at Dalhousie University, set to graduate in around three months. My primary interests are in the field of AI/ML and cloud computing. Over the years working in industry, I have gained experience working in computer vision, data analysis, visualization, and database management. From computer vision to now Generative AI, I have found that learning and building features using LLMs is not only more impactful but also more useful.
I believe in practical hands-on learning experiences while also learning the best practices. So, by my role as a teaching assistant at Dalhousie, I guide students in cloud computing, software development, and databases. Also, I enjoy developing working demos and projects to further enhance my skills and build usable applications.