Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標:支持以下联合国可持续发展目标:
Examination Committee
Prof Jianan QU, ECE/HKUST (Chairperson)
Prof Bertram SHI, ECE/HKUST (Thesis Supervisor)
Prof Ming LIU, ECE/HKUST
Abstract
In this thesis, we apply deep networks to facial expression analysis. We address the problems of expression recognition and expression generation. In particular, we address the data insufficiency problem of facial expression analysis for training deep networks by transfer learning and novel structure design.
For the recognition problem, we transfer and adapt deep networks pre-trained on ImageNet to the tasks of facial expression classification and action unit (AU) intensity estimation. First, we propose two atomic feature-map selection schemes for features at the higher convolutional layers for facial expression classification: Facial-Occupancy Selection and AU-Selectivity Selection. We then describe a Region of Interest (ROI)-based selection scheme for smile intensity estimation. Our results suggest that a substantial number of feature maps inside deep networks are selective to AUs, and that feature selection makes the system more robust and improves generalization. Second, we also study the dynamics of AU recognition by proposing a spatio-temporal model using a long short term memory (LSTM) network for smiling (AU12) estimation. Incorporating temporal information greatly improves the performance. Third, we address a multi-pose AU intensity estimation problem with a multi-task network. We take the bottom layers of VGG16 pre-trained on ImageNet, and fine-tune the overall multi-task structure to learn the shared representation for pose estimation and pose-dependent AU intensity estimation. Our results won the AU intensity estimation sub challenge of FERA2017.
For the generation problem, we propose the conditional difference adversarial autoencoder (CDAAE) for photo-realistic facial expression synthesis. The CDAAE takes a facial image of a previously unseen person and generates an image of that person’s face with a target facial expression. Despite a paucity of training data, the CDAAE can disambiguate changes due to identity and changes due to facial expression. It achieves with the addition of a feedforward path to the autoencoder structure connecting low-level features at the encoder to features at the corresponding level at the decoder. Our results demonstrate that CDAAE can better preserve identity information when generating facial expressions for unseen subjects than previous approaches. We also show that CDAAE can be used for facial expression interpolation and novel expression generation.