AI Thrust Seminar | Translating Across Modalities: Innovations in Visual-to-Text and Text-to-Visual Generation

Name: AI Thrust Seminar | Translating Across Modalities: Innovations in Visual-to-Text and Text-to-Visual Generation
Start: 2023-04-14
End: 2023-04-14

2023 年 4 月 14 日

9:00am - 10:00am

Zoom Meeting ID: 913 6608 1819, Passcode: 488490

Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標：支持以下联合国可持续发展目标：

Cross-modal generation is an important task under the generative AI umbrella, in which I have focused on the visual-to-text and text-to-visual generation. To translate semantic information across modalities freely, the challenges include (1) handling unrecognizable visual instances, and (2) generating controllable complex contents with high quality. In this talk, solutions from various viewpoints will be introduced. First, the approaches of unsupervised language structure inference and uncovering domain-specific concepts will be discussed, to enhance the visual-to-text generation model performance. Afterwards, to simultaneously achieve high-fidelity visual generation and cross-modal semantic matching, the inversion and online alignment frameworks will be presented. These research findings have been validated on various scenarios, which are potentially promising to help promote the domains of game development, health care, etc.

活动形式

研讨会, 演讲, 讲座

讲者/ 表演者:

Mr. Hao WANG

A final year PhD candidate in the School of Computer Science and Engineering, Nanyang Technological University, Singapore

Hao WANG is a final year PhD candidate in the School of Computer Science and Engineering at Nanyang Technological University, Singapore. He received the B.E. degree from Huazhong University of Science and Technology. His research interest is developing AI-powered perception and generation algorithms for the multimodal domain. In particular, his recent work investigates the translation between visual and text data, to generate controllable contents with efficiency and robustness. He has published first-authored top-tier conference and journal work in computer vision and multimedia fields, including CVPR, ECCV, ACM MM, IEEE TPAMI, IEEE TIP, IEEE TMM, etc.

语言

英文

适合对象

教职员

公众

研究生

本科生

主办单位

Artificial Intelligence Thrust, HKUST(GZ)

工程学

科学及科技

科大（广州）