IMPACT OF VISUAL MODALITIES IN MULTIMODAL PERSONALITY AND AFFECTIVE COMPUTING

Ryumina, E. V.; Karpov, A. A.

doi:https://doi.org/10.5194/isprs-archives-XLVIII-2-W3-2023-217-2023

Articles | Volume XLVIII-2/W3-2023

https://doi.org/10.5194/isprs-archives-XLVIII-2-W3-2023-217-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-archives-XLVIII-2-W3-2023-217-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume XLVIII-2/W3-2023

12 May 2023

| 12 May 2023

IMPACT OF VISUAL MODALITIES IN MULTIMODAL PERSONALITY AND AFFECTIVE COMPUTING

E. V. Ryumina and A. A. Karpov

Keywords: Personality Computing, Affective Computing, OCEAN, Multi-Task Regression, Multimodal Fusion, Mid-Level Fusion, Cross-Modal Attention, Neural Network

Abstract. Personality and affective computing techniques play a significant role for better understanding of human’s behavior and intentions. Such techniques can be applied in practice in recommendation systems, healthcare, education, and job applicant screening. In this paper, we propose a novel multimodal approach to personality traits assessment that leverages affective features of human’s voice and face, as well as recent advances in the deep learning. We present a new mid-level modality fusion strategy that is based on a cross-modal attention mechanism with summarizing functionals. In contrast to other state-of-the-art approaches, we not only analyze a visual scene, but specifically process human’s upper body (selfie) and a scene background. Our experiments show that the Extroversion personality trait is better estimated by fusing visual scene, face, and audio (voice) modalities, while the Conscientiousness and Agreeableness traits are better assessed by fusing face, selfie, and audio modalities. Furthermore, our results show that utilizing the selfie modality outperforms the visual scene modality by more than 1% in terms of the Concordance Correlation Coefficient. Additionally, our approach based on processing three modalities (selfie, face, and audio) is on-par with other known state-of-the-art approaches that employ at least four modalities on the test set of the ChaLearn First Impressions V2 corpus.

IMPACT OF VISUAL MODALITIES IN MULTIMODAL PERSONALITY AND AFFECTIVE COMPUTING

Useful Links

Useful External Links

Our Contact