论文信息 - Yet another acoustic representation of speech sounds

Yet another acoustic representation of speech sounds

This paper proposes yet another representation of speech sounds. The proposed speech modeling can remove both multiplicative and linear transformational distortion from speech theoretically. It means that speech sounds are represented without being affected by any static distortion inevitably involved in production, encoding, transmission, decoding, and hearing processes, such as differences in vocal tract length, gender, age, microphone, room, line, auditory characteristics, and so on. The method acoustically models not individual phones but their entire system, where only acoustic interrelation embedded in all the kinds of phones is focused. Since the method provides us with no absolute acoustic properties of phones, it cannot recognize or synthesize even a single phone. On the contrary, the proposed method is shown to be able to be applied to pronunciation assessment effectively and reliably, where the proficiency of pronunciation is estimated without using acoustic models of the individual phones directly in the matching.

Nobuaki Minematsu

[1] Nobuaki Minematsu,et al. English Speech Database Read by Japanese Learners for CALL System Development , 2002, LREC.

[2] Helmer Strik,et al. Automatic Speech Recognition for second language learning: How and why it actually works , 2003 .

[3] Hermann Ney,et al. Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[4] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[5] Steve J. Young,et al. Language learning based on non-native speech recognition , 1997, EUROSPEECH.

[6] M. Halle,et al. Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates , 1961 .

[7] Masatsune Tamura,et al. A Context Clustering Technique for Average Voice Models , 2003 .

[8] Hermann Ney,et al. Vocal tract normalization as linear transformation of MFCC , 2003, INTERSPEECH.