Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume XLVIII-2/W8-2024
https://doi.org/10.5194/isprs-archives-XLVIII-2-W8-2024-419-2024
https://doi.org/10.5194/isprs-archives-XLVIII-2-W8-2024-419-2024
14 Dec 2024
 | 14 Dec 2024

Identification of Lip Shape during Japanese Pronunciation Using Deep Learning in Point Cloud Video

Shou Shimizu, Ryo Sato, Koki Nakamura, Akira Taguchi, and Yue Bao

Keywords: deep learminmg, point cloud, silent speech, image processing, LiDAR, face recognition

Abstract. People with speech impediments and hearing impairments,whether congenital or acquired, often encounter difficulty in speaking. there is a strong need to actually communicate using vocalization. This aproach needs to pratice and speak correctly. To extract features from data containing individual differences, deep learning methods are utilized. However Prior works extracts features based on point cloud data for lip moving change in three dimensional space, it lacks consideration for temporal sequences.In this work we first identify temporal depth sequences as a new unique sensory information of Japanese pronunciation. We utilized P4Transformer as temporal-spacial model with point clouds. In this study, we performed identification of Japanese pronunciation using point clouds video by machine learning. The accuracy of vowel and consonant identification was estimated to be 96.0% and 33.2% based on the results obtained from experiments.The estimation of vowels was 10% improvement.