Acoustic parameters for the automatic detection of vowel nasalization

The aim of this work was to propose Acoustic Parameters (APs) for the automatic detection of vowel nasalization based on prior knowledge of the acoustics of nasalized vowels. Nine automatically extractable APs were proposed to capture the most important acoustic correlates of vowel nasalization (extra pole-zero pairs, F1 amplitude reduction, F1 bandwidth increase and spectral flattening). The performance of these APs was tested on several databases with different sampling rates and recording conditions. Accuracies of 96.28%, 77.90% and 69.58% were obtained by using these APs on StoryDB, TIMIT and WS96/97 databases, respectively, in a Support Vector Machine classifier framework. To our knowledge these results are the best anyone has achieved on this task. Index Terms: nasal, nasalization, acoustic parameters, landmark, speech recognition.

[1]  Shinji Maeda,et al.  Acoustic cues for vowel nasalization: A simulation study , 1982 .

[2]  Carol Y. Espy-Wilson,et al.  Speech recognition based on phonetic features and acoustic landmarks , 2004 .

[3]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Marilyn Y. Chen,et al.  Acoustic correlates of English and French nasalized vowels. , 1997, The Journal of the Acoustical Society of America.

[5]  Tarun Pruthi,et al.  Simulation and analysis of nasalized vowels based on magnetic resonance imaging data. , 2007, The Journal of the Acoustical Society of America.

[6]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[7]  James Glass,et al.  Detection of nasalized vowels in American English , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  J.H.L. Hansen,et al.  A noninvasive technique for detecting hypernasal speech using a nonlinear operator , 1996, IEEE Transactions on Biomedical Engineering.

[9]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[10]  Marie K. Huffman,et al.  The role of F1 amplitude in producing nasal percepts , 1990 .

[12]  S Hawkins,et al.  Acoustic and perceptual correlates of the non-nasal--nasal distinction for vowels. , 1985, The Journal of the Acoustical Society of America.

[13]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[14]  Mark Hasegawa-Johnson,et al.  Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  D. Talkin Speech formant trajectory estimation using dynamic programming with modulated transition costs , 1987 .

[16]  Tarun Pruthi Analysis, vocal-tract modeling and automatic detection of vowel nasalization , 2007 .