Department of Industrial Engineering & Decision Analytics [Joint IEDA/ISOM] seminar - Towards Reliable and Efficient LLM Systems — Operations Research for Inference Optimization and Multi-Agent Evaluation

10:30am - 11:30am
Room 5583 [lift 29-30]

Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標:支持以下联合国可持续发展目标:

     Large language models (LLMs) are becoming core infrastructures for many applications, but large-scale deployment is hampered by high inference cost and the difficulty of obtaining reliable evaluations of complex, tool-using workflows beyond heuristics. We take an Operations Research approach for improving these system challenges from two angles: improving inference efficiency under memory constraints, and designing trustworthy evaluation methodologies for multi-agent, tool-driven LLM systems.

     In the first part, we study LLM inference as a multi-stage online scheduling problem in which jobs grow their key–value (KV) cache over time under a hard GPU memory budget. We derive a fluid dynamics that yields an explicit upper bound on achievable throughput, and use it to design a  threshold-based policy, WAIT, for the setting with known output lengths. We then extend this design to Nested WAIT for unknown output lengths, which classifies jobs on the fly via a nested segmentation of the decode process. Both policies are asymptotically near-optimal, approaching the fluid throughput bound while keeping latency and time-to-first-token under control.

     In the second part, we focus on reliable evaluation of multi-step, tool-using LLM systems. We introduce PILOT-Bench, a benchmark built around a general Markov decision process (MDP) framework that models workflow execution with stochastic tools, imperfect guidance, and realistic API failures. In this framework, a carefully specified MDP both defines the optimal workflow guidance for rich task families—computed via reinforcement learning—and provides an MDP-optimal path against which we can assess reliability under flawed human- or LLM-designed workflows, tool-call failures, and noisy or incomplete data. We use this MDP to generate optimal execution policies and systematically perturbed workflows, enabling principled, workflow-level evaluation of models’ planning, error recovery, and tool-use robustness across thousands of real-world tasks and a rich tool library.

Event Format
Speakers / Performers:
Mr. Ruicheng AO
Massachusetts Institute of Technology (MIT)

Ruicheng Ao is a third-year PhD candidate at MIT's Institute for Data, Systems, and Society, advised by Prof. David Simchi-Levi and also works with Prof. Thomas Magnanti. His research lies at the intersection of operations research, statistics, and large language models. He integrates rigorous methodologies from optimization and stochastic analysis to enhance LLM system performance, while also investigating how LLMs can revolutionize decision-making in classical operations research domains such as supply chain management and experimental design. His works received honorable mention awards for the INFORMS Applied Probability Society and Revenue Management and Pricing Best Student Paper Competitions, and he has twice received the Excellence Award in the Alibaba Global Mathematics Competition.

Language
English
Recommended For
Alumni
Faculty and staff
PG students
Organizer
Department of Industrial Engineering & Decision Analytics
Department of Information Systems, Business Statistics & Operations Management
Post an event
Campus organizations are invited to add their events to the calendar.