CKSRI Seminar Series 2026 “World Modeling from Computer Vision"

Name: CKSRI Seminar Series 2026 “World Modeling from Computer Vision"
Start: 2026-05-26
End: 2026-05-26
Location: Room 4580

26 May 2026

11:00am - 12:00pm

Room 4580

Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標：支持以下联合国可持续发展目标：

Abstract
Large Language Models have given us an AI that reads, writes, and codes, yet they have not given us a robot that can fold laundry. Closing this gap is the next major frontier in AI, and it is precisely the problem world models exist to solve — systems that learn from raw observation how the world looks, how it evolves, and how it responds to an agent's actions.

The critical question is where such systems originate. While the popular view treats world models as a novel paradigm distinct from computer vision, I argue that world models are inherently rooted in computer vision. The large-scale representation learning, generative modeling, and spatiotemporal attention machinery driven by the past decade of vision research are exactly what we need to perceive, simulate, and act. World modeling is not what comes after computer vision; it is what computer vision becomes when we unify reconstruction, simulation, and action.

I will make this concrete through three interconnected works. First, LingBot-Map establishes the perception backbone, replacing hand-crafted SLAM with an end-to-end attention model that embeds geometric priors, recovering 3D structure from continuous video at scales legacy pipelines cannot achieve. Second, LingBot-World introduces the dynamics layer through neural world simulation, moving beyond static geometry to model the causal and physical evolution of scenes. Finally, LingBot-VA translates this world model into action by jointly modeling dynamics and control within a single autoregressive framework, turning a passive representation into an embodied controller.

Ultimately, world modeling represents the next phase of AI, and computer vision is the discipline that builds it.

Room 4580

Event Format

Seminar, Lecture, Talk

Speakers / Performers:

Prof. Yinghao XU

HKUST Robotics Institute

Yinghao Xu is an Assistant Professor in the Department of Computer Science and Engineering at the Hong Kong University of Science and Technology (HKUST). Previously, he was a Staff Research Scientist at RobbyAnt, working on world models and embodied AI. Before that, he was a postdoctoral researcher at Stanford University. His research lies at the intersection of 3D computer vision, generative AI, and embodied AI, with a recent focus on building world models that unify 3D reconstruction, world simulation, and embodied action. He was the recipient of the Yunfan Award at WAIC 2024 and was nominated for the Snap Fellowship in 2022.

Language

English

Recommended For

Alumni

Elderly

Faculty and staff

General public

HKUST Family

PG students

UG students

Organizer

Cheng Kar-Shun Robotics Institute

Engineering

Science & Technology