Department of Industrial Engineering & Decision Analytics [Joint IEDA/ISOM] seminar - Data collection and decision making with imperfect predictions from LLMs
Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標:支持以下联合国可持续发展目标:
Digital platforms often rely on human-response data to evaluate services and make operational decisions, yet such data are often costly, limited, or incomplete. Large language models (LLMs) provide a new source of low-cost and scalable predictions about individual behavior, but these predictions can be inaccurate and systematically biased.
In this talk, I study how to use imperfect LLM predictions to support reliable platform decisions at a lower cost. I discuss two problems. The first is how to use LLM predictions to guide data collection. We develop an algorithm that allocates data collection effort under a fixed budget and minimizes the estimation error of key quantities. The second is how to use LLM predictions after data collection to address missing outcomes. In this setting, we introduce a structural framework that uses LLM outputs as auxiliary information rather than as direct substitutes for human responses, enabling reliable inference despite imperfect predictions. Together, these results show how LLMs can improve data-driven decision-making when they are used carefully as helpful but imperfect sources of information.
Hongyu Chen is a fourth-year PhD student at MIT, advised by Prof. David Simchi-Levi. His research develops methodologically rigorous frameworks for data collection and decision-making in complex environments, with a particular focus on leveraging generative AI as a data augmentation tool to improve efficiency, reliability, and economic value. Before joining MIT, he received his bachelor’s degree in Statistics and Economics from Peking University.