Human Commonsense and Physical Reasoning for Robot Learning
Humans possess an array of sensory and reasoning abilities: our eyes perceive and comprehend the 3D world, our brains plan actions with inherent knowledge to navigate complex tasks, and our hands interact with objects to estimate their physical properties. However, the current reasoning capabilities of robots still lag far behind humans. In this talk, I will introduce imbuing robots with human commonsense and physical reasoning capabilities. I will first introduce multimodal commonsense reasoning for robot learning, which integrates vision world models and language-based task planners to parse visual scenes, decompose complex tasks, and plan actions, serving as the foundation for robots to mimic human behaviors. Beyond vision and language, I will delve into physical reasoning, which enables robots to infer dynamics and physical properties of objects, such as mass, friction, and fragility, thereby enhancing robotic manipulation with precision and generalizability. The synergy of these components embodies robotic commonsense reasoning, highlighting the potential of this interdisciplinary field in the future. Despite challenges, I conclude optimistically: as more modalities and reasoning capabilities are integrated into robot learning, robots will generalize better across various everyday tasks and scenarios, with greater social benefits and impacts.
Mingyu Ding is currently a postdoctoral fellow at UC Berkeley, working with Prof. Masayoshi Tomizuka. Previously, he was a visiting scholar at MIT and obtained his PhD from the University of Hong Kong. Mingyu's research objective is to develop robots and embodied agents that can perceive, reason about, and interact effectively with the 3D physical world like humans, by integrating insights from interdisciplinary domains including robotics, vision & language, and brain science. He has published over 40 top-tier papers at leading AI and robotics conferences, accumulating over 2000 citations on Google Scholar. His works have been recognized by multiple research awards, such as the Baidu Fellowship, Microsoft Fellowship Nomination, and Rising Star Awards in both ME and AI fields, for his contribution in improving the generalization of robotic manipulation toward achieving general-purpose robot intelligence.