Department of Industrial Engineering & decision Analytics [IEDA Seminar] - Robust Regret Markov Decision Processes
In recent years, robust Markov decision processes (MDPs) have emerged as a popular alternative to standard MDPs, as they do not require precise knowledge of model parameters. Robust MDPs implement the maximin principle and optimize worst-case performance across plausible model parameters. While robust MDPs offer reliable policies with limited data, these policies are often overly conservative. In this talk, we propose to adopt the regret decision criterion and introduce a novel framework called robust regret MDPs, which optimize the sum of stepwise regret under model ambiguity. We derive the dynamic programming formulation for our model and provide the tractable convex reformulation for the robust regret Bellman update, and so it could be computed by using off-the-shelf solvers. To further improve the computational efficiency, we develop a novel tailored algorithm that could compute robust regret Bellman update in quasi-linear time. Through our experiments, we show that the proposed algorithm outperforms state-of-the-art solvers by several orders of magnitude.
Clint Chin Pang Ho is an Assistant Professor in the School of Data Science at the City University of Hong Kong. He received a BS in Applied Mathematics from the University of California, Los Angeles (UCLA), an MSc in Mathematical Modeling and Scientific Computing from the University of Oxford, and a PhD in computational optimization from Imperial College London. Before joining CityU, Clint was a Junior Research Fellow in the Imperial College Business School.
Clint's current research focuses on decision making under uncertainty. He studies optimization algorithms and computational methods for structured problems, as well as their applications in operations research and machine learning.