AI Thrust Seminar | Translating Across Modalities: Innovations in Visual-to-Text and Text-to-Visual Generation

9:00am - 10:00am
Zoom Meeting ID: 913 6608 1819, Passcode: 488490

Cross-modal generation is an important task under the generative AI umbrella, in which I have focused on the visual-to-text and text-to-visual generation. To translate semantic information across modalities freely, the challenges include (1) handling unrecognizable visual instances, and (2) generating controllable complex contents with high quality. In this talk, solutions from various viewpoints will be introduced. First, the approaches of unsupervised language structure inference and uncovering domain-specific concepts will be discussed, to enhance the visual-to-text generation model performance. Afterwards, to simultaneously achieve high-fidelity visual generation and cross-modal semantic matching, the inversion and online alignment frameworks will be presented. These research findings have been validated on various scenarios, which are potentially promising to help promote the domains of game development, health care, etc.

讲者/ 表演者:
Mr. Hao WANG
A final year PhD candidate in the School of Computer Science and Engineering, Nanyang Technological University, Singapore

Hao WANG is a final year PhD candidate in the School of Computer Science and Engineering at Nanyang Technological University, Singapore. He received the B.E. degree from Huazhong University of Science and Technology. His research interest is developing AI-powered perception and generation algorithms for the multimodal domain. In particular, his recent work investigates the translation between visual and text data, to generate controllable contents with efficiency and robustness. He has published first-authored top-tier conference and journal work in computer vision and multimedia fields, including CVPR, ECCV, ACM MM, IEEE TPAMI, IEEE TIP, IEEE TMM, etc.

语言
英文
适合对象
教职员
公众
研究生
本科生
主办单位
Artificial Intelligence Thrust, HKUST(GZ)
新增活动
请各校内团体将活动发布至大学活动日历。