Department of Industrial Engineering & Decision Analytics [Joint IEDA/ISOM Seminar] - Foundations and Frontiers: Pioneering Large Language Model Inference in Operations Research

10:30am - 11:30am
Room 6573 [lift 29-30]

Large Language Model (LLM) inference involves the computational techniques and strategies used to process input prompts and generate responses through a large language model. This field intersects significantly with online optimization, particularly in areas like online batching, scheduling, and resource allocation, making it a topic of keen interest to the OR community. In this talk, I will introduce two foundational models for managing computational tasks on a single GPU. The first model explores online separate batching and scheduling, where each batch exclusively contains either prompt jobs (initial requests for processing) or token jobs (subsequent computational tasks derived from prompts). We present an algorithm based on compensated coupling to achieve constant regret. The second model studies the composition of mixed batching under the stable environment, proposing a novel algorithm supported by a combinatorial proof that ensures a competitive ratio. Drawing on my experience at Microsoft Research and Azure, I will also highlight emerging trends and future research directions in this evolving field.

Event Format
Speakers / Performers:
Mr. Zijie ZHOU
Massachusetts Institute of Technology (MIT), LIDS & ORC

My name is Zijie Zhou. I will be a rising fourth year PhD candidate at MIT Laboratory for Information and Decision Systems (LIDS) & Operations Research Center (ORC), and I will be entering the academic job market this fall. My advisors are Professor Patrick Jaillet from MIT EECS and Professor Chara Podimata from MIT Sloan. My research expertise lies in online algorithm design, online optimization, and experiment design. This summer, I am interning at Microsoft Research and Azure, where I am focused on developing foundational models for LLM inference in the operations research community. In the summer of 2023, I interned at Oracle Lab, where I designed robust booking limit and dynamic upgrading mechanisms for the hospitality industry.

Language
English
Recommended For
Faculty and staff
PG students
Organizer
Department of Industrial Engineering & Decision Analytics
Department of Information Systems, Business Statistics & Operations Management
Post an event
Campus organizations are invited to add their events to the calendar.