Toward Speech Articulation Detection through Smartphone

Articulation problems seriously impact speech communication and comprehension. Hence, Automatic Speech Recognition (ASR) has been applied to detect and analyze articulation from speech signals in various applications such as: clinical protocols, foreign language learning, and language proficiency testing. However, articulation detection and analysis has not been adequately evaluated due to its complex nature. The challenging task is that speech signal alone contains insufficient information for articulation detection and analysis. Hence, we propose an alternative approach for articulation detection and analysis by developing a system that senses users’ articulatory organs (tongue and lips) based on phonetic rules. The system employs speech and ultrasonic signals simultaneously to read lip shape and tongue position. We also implemented the proposed technique on an off-the-shelf smartphone to enhance applicability in real-world scenarios. We evaluated the system using four languages: French, Japanese, Korean, and Mandarin Chinese. The result of our evaluation shows that our system is robust in recognizing vowel sound articulation with an overall accuracy of 94.74%.

[1]  Kaishun Wu,et al.  Pronunciation Training through Sensing of Tongue and Lip Motion via Smartphone , 2021, 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops).

[2]  Minglu Li,et al.  LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[3]  Jeffrey J Berry,et al.  Accuracy of the NDI wave speech research system. , 2011, Journal of speech, language, and hearing research : JSLHR.

[4]  Zhaoming Huang,et al.  Automatic assessment of putonghua articulation and pronunciation disorder , 2015, 2015 International Symposium on Bioelectronics and Bioinformatics (ISBB).

[5]  Bernd J. Kröger,et al.  Two- and three-dimensional visual articulatory models for pronunciation training and for treatment of speech disorders , 2008, INTERSPEECH.

[6]  M. D. Supple Reading and articulation. , 1986 .

[7]  Aslan B. Wong Authentication through Sensing of Tongue and Lip Motion via Smartphone , 2021, 2021 18th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).