Information Hub Seminar - Eureka: Human-Level Reward Design via Coding Large Language Models

11:00am - 12:30pm

Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks. However, harnessing them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem. We bridge this fundamental gap and present Eureka, a human-level reward design algorithm powered by LLMs. Eureka exploits the remarkable zero-shot generation, code-writing, and in-context improvement capabilities of state-of-the-art LLMs, such as GPT-4, to perform evolutionary optimization over reward code. The resulting rewards can then be used to acquire complex skills via reinforcement learning. Without any task-specific prompting or pre-defined reward templates, Eureka generates reward functions that outperform expert human-engineered rewards. In a diverse suite of 29 open-source RL environments that include 10 distinct robot morphologies, Eureka outperforms human experts on 83% of the tasks, leading to an average normalized improvement of 52%. The generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF), readily incorporating human inputs to improve the quality and the safety of the generated rewards without model updating. Finally, using Eureka rewards in a curriculum learning setting, we demonstrate for the first time, a simulated Shadow Hand capable of performing pen spinning tricks, adeptly manipulating a pen in circles at rapid speed.

講者/ 表演者:
Yecheng (Jason) Ma
University of Pennsylvania

Jason Ma is a 4th year PhD student at the University of Pennsylvania, advised by Dinesh Jayaraman and Osbert Bastani. His research interest spans the intersection of robot learning and reinforcement learning, with an emphasis on learning and deploying foundation models for robotics. During his PhD, Jason has interned at NVIDIA AI and Meta AI, where he conducted high-impact research on foundation models for robotics that have received critical acclaim from both academia and industry. His latest work Eureka has received extensive coverage from popular press, such as Yahoo, TechCrunch, Wired, and VentureBeat, and in total 100+ medias. Jason has published 8 first-author publications at top AI conferences (NeurIPS, ICML, ICLR, AAAI, ICCV, .. ). Prior to Penn, Jason completed his bachelor degree at Harvard University, where he received the highest honor distinction in Computer Science and was a winner of the Thomas Temple Hoopes Prize for outstanding undergraduate thesis.

語言
英文
適合對象
校友
長者
教職員
公眾
科大家庭
研究生
本科生
主辦單位
Information Hub, HKUST(GZ)
新增活動
請各校內團體將活動發布至大學活動日曆。