Considerable research progress in the areas of computer vision and multimodal analysis have now made the examination of complex phenomena such as social interactions possible. An important cue toward determining social interactions is the head pose of interacting members. While most automated social interaction analysis methods have focused on round-table meetings where head pose estimation (HPE) is easier given the high resolution of captured faces and the analyzed targets are static (seated), recent works have examined unstructured meeting scenes such as cocktail parties. While unstructured meeting scenes, where targets are free to move, provide additional cues such as proxemics for behavior analysis, they are also challenging to analyze owing to (i) the need to use distant, large field-of-view cameras which can only capture low-resolution faces of targets, and (ii) the variations in targets' facial appearance as they move, owing to changing camera perspective and scale. This chapter reviews recent works addressing HPE under target motion. In particular, we examine the use of transfer learning and multitask learning for HPE. Transfer learning is particularly useful when the training and the test data have different attributes (e.g., training data contains pose annotations for static targets, but test data involves moving targets), while multitask learning can be explicitly designed to address facial appearance variations under motion. Exhaustive experiments performed using both methodologies are presented.
Exploring Multitask and Transfer Learning Algorithms for Head Pose Estimation in Dynamic Multiview Scenarios
Elisa Ricci;Oswald Lanz;
2017-01-01
Abstract
Considerable research progress in the areas of computer vision and multimodal analysis have now made the examination of complex phenomena such as social interactions possible. An important cue toward determining social interactions is the head pose of interacting members. While most automated social interaction analysis methods have focused on round-table meetings where head pose estimation (HPE) is easier given the high resolution of captured faces and the analyzed targets are static (seated), recent works have examined unstructured meeting scenes such as cocktail parties. While unstructured meeting scenes, where targets are free to move, provide additional cues such as proxemics for behavior analysis, they are also challenging to analyze owing to (i) the need to use distant, large field-of-view cameras which can only capture low-resolution faces of targets, and (ii) the variations in targets' facial appearance as they move, owing to changing camera perspective and scale. This chapter reviews recent works addressing HPE under target motion. In particular, we examine the use of transfer learning and multitask learning for HPE. Transfer learning is particularly useful when the training and the test data have different attributes (e.g., training data contains pose annotations for static targets, but test data involves moving targets), while multitask learning can be explicitly designed to address facial appearance variations under motion. Exhaustive experiments performed using both methodologies are presented.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.