Towards Robust, Efficient and Practical Decision Making: From Reward-Maximizing Deep Reinforcement Learning to Reward-Matching GFlowNets

10:00am - 11:00am
at ECE conference room 2515-2516 (2/F via lifts 25/26), Academic Building, Clear Water Bay, Kowloon

Recent years have witnessed the great success of RL with deep feature representations in many challenging tasks, including computer games, robotics, smart city, and so on. Yet, solely focusing on the optimal solution based on a reward proxy and learning the reward-maximizing policy is not enough. Diversity of the generated states is desirable in a wide range of important practical scenarios such as drug discovery, recommender systems, dialogue systems, etc. For example, in molecule generation, the reward function used in in-silico simulations can be uncertain and imperfect itself (compared to the more expensive in-vivo experiments). Therefore, it is not sufficient to only search for the solution that maximizes the return. Instead, it is desired that we sample many high-reward candidates, which can be achieved by sampling them proportionally to the reward of each terminal state. The Generative Flow Network (GFlowNet) is a probabilistic framework proposed by Yoshua Bengio in 2021 where an agent learns a stochastic policy for object generation, such that the probability of generating an object is proportional to a given reward function, i.e., by learning a reward-matching policy. Its effectiveness has been shown in discovering high-quality and diverse solutions in molecule generation, biological sequence design, etc. The talk concerns my recent research works about how we tackle three important challenges in such decision-making systems. Firstly, how can we ensure a robust learning behavior and value estimation of the agent? Secondly, how can we improve its learning efficiency? Thirdly, how to successfully apply them in important practical applications such as computational sustainability problems and drug discovery?





讲者/ 表演者:
Dr. Ling PAN
Postdoctoral Researcher, Montreal Institute for Learning Algorithms (MILA), Montréal, Canada

Ling Pan is a postdoctoral fellow at MILA supervised by Prof. Yoshua Bengio. She received her Ph.D. from the Institute for Interdisciplinary Information Sciences (IIIS) (headed by Prof. Andrew Yao), Tsinghua University in 2022. Her research mainly focuses on developing algorithmic foundations and practical applications of generative flow networks (GFlowNets; Bengio et al., 2021), reinforcement learning, and multi-agent systems. She focuses on developing robust, efficient, and practical deep reinforcement learning algorithms. She visited Stanford University working with Prof. Tengyu Ma, the University of Oxford working with Prof. Shimon Whiteson, and the Machine Learning Group at Microsoft Research Asia working with Dr. Wei Chen during her Ph.D. She was a recipient of the Microsoft Research Asia Fellowship (2020).