Department of Electronic and Computer Engineering Seminar - Probing CLIP's Comprehension of 360-Degree Textual and Visual Semantics
Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標:支持以下联合国可持续发展目标:
The dream of instantly creating rich 360-degree panoramic worlds from text is rapidly becoming a reality, yet a crucial gap exists in our ability to reliably evaluate their semantic alignment. Contrastive Language-Image Pre-training (CLIP) models, standard AI evaluators, predominantly trained on perspective image-text pairs, face an open question regarding their understanding of the unique characteristics of 360-degree panoramic image-text pairs. In this talk, we will present some of our preliminary efforts to address this gap. This is a joint work with Hai Wang (UCL), Xiaochen Yang (Glasgow), and Mingzhi Dong (Bath).
Jinghao Xue received the Dr.Eng. degree in signal and information processing from Tsinghua University in 1998, and the Ph.D. degree in statistics from the University of Glasgow in 2008. He is a Professor of Statistical Pattern Recognition in the Department of Statistical Science, University College London. His research interests include statistical pattern recognition, machine learning, and computer vision. He is a Senior Area Editor of the IEEE Transactions on Circuits and Systems for Video Technology.