Department of Mathematics - Seminar on Statistics and Data Science - Exponential Lower Bounds and Fast Convergence for Policy Optimization

11:00am - 12:00pm
https://hkust.zoom.us/j/94883840530 (Passcode: hkust)

Policy gradient (PG) methods and their variants lie at the heart of modern reinforcement learning. Due to the intrinsic non-concavity of value maximization, however, the theoretical underpinnings of PG-type methods have been limited even until recently. In this talk, we discuss both the ineffectiveness and effectiveness of nonconvex policy optimization. On the one hand, we demonstrate that the popular softmax policy gradient method can take exponential time to converge. On the other hand, we show that employing natural policy gradients and enforcing entropy regularization allows for fast global convergence. 

讲者/ 表演者:
Prof. Yuting WEI
The Wharton School, University of Pennsyvania

Yuting Wei is currently an assistant professor in the Statistics and Data Science Department at the Wharton School, University of Pennsylvania. Prior to that, Yuting spent two years at Carnegie Mellon University as an assistant professor of statistics, and one year at Stanford University as a Stein Fellow. She received her Ph.D. in statistics at the University of California, Berkeley. She was the recipient of the 2022 NSF Career award, the Erich L. Lehmann Citation from the Berkeley statistics department in 2018. Her research interests include high-dimensional and non-parametric statistics, statistical machine learning, and reinforcement learning.

语言
英文
适合对象
校友
教职员
研究生
本科生
主办单位
数学系
新增活动
请各校内团体将活动发布至大学活动日历。