Clustering Research on Final Features in Chinese Mandarin under Emotional Differences

Under the emotional differences, phoneme feature clustering can unveil the inherent mathematical relationship among the phonemes, laying a research foundation for further emotional analysis using phonemes. In this paper, the audio-visual bimodal phoneme emotional corpus of Chinese Mandarin is firstly constructed with the self-recorded data, and the audio-visual bimodal features of final phoneme are extracted from the samples in the corpus. Then, by using feature clustering with the ISODATA algorithm, the finals of different emotions in the phonetic environment are divided into three and four types respectively. Finally, the clustering results in this paper are compared with the traditional classification methods based on final structural constituents and pronouncing mouth pattern, and the comparison results are further analyzed.

[1]  P. Ladefoged A course in phonetics , 1975 .

[2]  Murray R. Spiegel,et al.  Schaum's outlines probability and statistics , 2009 .

[3]  Hiroya Fujisaki,et al.  Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing , 1983 .

[4]  K. R. Rao,et al.  H.264/MPEG-4 Advanced Video Coding , 2014 .

[5]  Zheng Fang,et al.  Comparison of different implementations of MFCC , 2001 .

[6]  Guy Fouché,et al.  Introduction to Data Mining , 2011 .

[7]  Qi Tian,et al.  HMM-Based Audio Keyword Generation , 2004, PCM.

[8]  Cui Huijuan Speech Endpoint Detection Algorithm Analyses Based on Short-term Energy , 2005 .

[9]  Roman Meshcheryakov,et al.  Choice of Signal Short-Term Energy Parameter for Assessing Speech Intelligibility in the Process of Speech Rehabilitation , 2018, SPECOM.

[10]  Gaurav Kumar Tak,et al.  Clustering Approach in Speech Phoneme Recognition Based on Statistical Analysis , 2010, CNSA.

[11]  Richard Harvey,et al.  Alternative Visual Units for an Optimized Phoneme-Based Lipreading System , 2019, Applied Sciences.

[12]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[13]  张国亮,et al.  Comparison of Different Implementations of MFCC , 2001 .

[14]  Ingo R. Titze,et al.  Principles of voice production , 1994 .

[15]  Murray R. Spiegel,et al.  Schaum's Outline of Theory and Problems of Probability and Statistics , 1980 .

[16]  Artur Gromek The H.264/MPEG4 advanced video coding , 2009, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).

[17]  Anders Löfqvist,et al.  Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization. , 2015, The Journal of the Acoustical Society of America.

[18]  Shigeki Sagayama,et al.  Phoneme environment clustering for speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[19]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[20]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[21]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[22]  Katsuhiko Shirai,et al.  Multi-level clustering of acoustic features for phoneme recognition based on mutual information , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[23]  Gwo-Long Li,et al.  Fast multiframe motion estimation algorithms by motion vector composition for the MPEG-4/AVC/H.264 standard , 2006, IEEE Transactions on Multimedia.

[24]  Lin-Shan Lee,et al.  Supervised Detection and Unsupervised Discovery of Pronunciation Error Patterns for Computer-Assisted Language Learning , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .