Bridging Vision and Language for Cross-Modal Understanding and Generation

Name: Bridging Vision and Language for Cross-Modal Understanding and Generation
Start: 2022-01-25
End: 2022-01-25
Location: ZOOM: https://hkust.zoom.us/j/94180568468 Meeting ID: 941 8056 8468 Passcode: 304742

2022 年 1 月 25 日

9:30am - 10:30am

ZOOM: https://hkust.zoom.us/j/94180568468 Meeting ID: 941 8056 8468 Passcode: 304742

Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標：支持以下联合国可持续发展目标：

While large progress has been made in both computer vision and natural language processing in the past decade, bridging vision and language remains a fundamental and challenging problem for advancing artificial intelligence. Vision is the most important approach that we perceive in the world, while language contains high-level semantic information and abstract knowledge for communication and reasoning. Bridging the complementary modalities not only benefits representation learning in each modality, but also empowers the future AI systems to unify perception, communication, reasoning, and creation abilities.

In this talk, I will introduce my research that focuses on vision-language cross-modal understanding and generation. Understanding cross-modal information will enable AI systems to process and learn more complex information and better interact with humans. I will firstly introduce my efforts on vision-language cross-modal understanding, including discriminative image captioning, visual grounding, text-image retrieval, and learning visual representations from language supervision. Besides perception and understanding, creation and imagination abilities are more advanced intelligence. In particular, synthesizing images based on text instructions allows fine-grained and user-friendly control for visual content creation and editing. In the second part of my talk, I will introduce my contributions to visual-language cross-modal generation, including the first open-domain open-vocabulary language-based image editing algorithm, the first unified framework for language-guided and image-guided image synthesis, the first benchmark for compositional text-to-image synthesis, and an algorithm on semantic image synthesis. Finally, I will discuss my research plans for the future.

活動形式

研討會, 演講, 講座

講者/ 表演者:

Dr. Xihui Liu

Postdoctoral Scholar, Department of Electrical Engineering and Computer Sciences, University of California at Berkeley

Dr. Xihui Liu is a Postdoctoral Scholar at the Department of Electrical Engineering and Computer Sciences, UC Berkeley, advised by Prof. Trevor Darrell. Before that, she received her Ph.D. degree from the Department of Electronic Engineering at The Chinese University of Hong Kong and her B.Eng. degree from the Department of Electronic Engineering at Tsinghua University. Her research interests cover the broad area of computer vision, natural language processing, machine learning, and artificial intelligence, with a special focus on the intersection between vision and language. She was awarded Adobe Research Fellowship 2020 and MIT EECS Rising Stars 2021.

語言

英文

適合對象

教職員

研究生

主辦單位

電子及計算機工程學系

工程學

科學及科技