Department of Industrial Engineering & Decision Analytics [Joint IEDA/ISOM Seminar] - Foundations and Frontiers: Pioneering Large Language Model Inference in Operations Research

10:30am - 11:30am
Room 6573 [lift 29-30]

Large Language Model (LLM) inference involves the computational techniques and strategies used to process input prompts and generate responses through a large language model. This field intersects significantly with online optimization, particularly in areas like online batching, scheduling, and resource allocation, making it a topic of keen interest to the OR community. In this talk, I will introduce two foundational models for managing computational tasks on a single GPU. The first model explores online separate batching and scheduling, where each batch exclusively contains either prompt jobs (initial requests for processing) or token jobs (subsequent computational tasks derived from prompts). We present an algorithm based on compensated coupling to achieve constant regret. The second model studies the composition of mixed batching under the stable environment, proposing a novel algorithm supported by a combinatorial proof that ensures a competitive ratio. Drawing on my experience at Microsoft Research and Azure, I will also highlight emerging trends and future research directions in this evolving field.

講者/ 表演者:
Mr. Zijie ZHOU
Massachusetts Institute of Technology (MIT), LIDS & ORC

My name is Zijie Zhou. I will be a rising fourth year PhD candidate at MIT Laboratory for Information and Decision Systems (LIDS) & Operations Research Center (ORC), and I will be entering the academic job market this fall. My advisors are Professor Patrick Jaillet from MIT EECS and Professor Chara Podimata from MIT Sloan. My research expertise lies in online algorithm design, online optimization, and experiment design. This summer, I am interning at Microsoft Research and Azure, where I am focused on developing foundational models for LLM inference in the operations research community. In the summer of 2023, I interned at Oracle Lab, where I designed robust booking limit and dynamic upgrading mechanisms for the hospitality industry.

語言
英文
適合對象
教職員
研究生
主辦單位
Department of Industrial Engineering & Decision Analytics
資訊,商業統計及營運學系
新增活動
請各校內團體將活動發布至大學活動日曆。