IMPACT OF VISUAL MODALITIES IN MULTIMODAL PERSONALITY AND AFFECTIVE COMPUTING
Keywords: Personality Computing, Affective Computing, OCEAN, Multi-Task Regression, Multimodal Fusion, Mid-Level Fusion, Cross-Modal Attention, Neural Network
Abstract. Personality and affective computing techniques play a significant role for better understanding of human’s behavior and intentions. Such techniques can be applied in practice in recommendation systems, healthcare, education, and job applicant screening. In this paper, we propose a novel multimodal approach to personality traits assessment that leverages affective features of human’s voice and face, as well as recent advances in the deep learning. We present a new mid-level modality fusion strategy that is based on a cross-modal attention mechanism with summarizing functionals. In contrast to other state-of-the-art approaches, we not only analyze a visual scene, but specifically process human’s upper body (selfie) and a scene background. Our experiments show that the Extroversion personality trait is better estimated by fusing visual scene, face, and audio (voice) modalities, while the Conscientiousness and Agreeableness traits are better assessed by fusing face, selfie, and audio modalities. Furthermore, our results show that utilizing the selfie modality outperforms the visual scene modality by more than 1% in terms of the Concordance Correlation Coefficient. Additionally, our approach based on processing three modalities (selfie, face, and audio) is on-par with other known state-of-the-art approaches that employ at least four modalities on the test set of the ChaLearn First Impressions V2 corpus.