Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion.

An automatic speech recognition approach is presented which uses articulatory features estimated by a subject-independent acoustic-to-articulatory inversion. The inversion allows estimation of articulatory features from any talker's speech acoustics using only an exemplary subject's articulatory-to-acoustic map. Results are reported on a broad class phonetic classification experiment on speech from English talkers using data from three distinct English talkers as exemplars for inversion. Results indicate that the inclusion of the articulatory information improves classification accuracy but the improvement is more significant when the speaking style of the exemplar and the talker are matched compared to when they are mismatched.

[1]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2]  V. Gracco,et al.  Accurate recovery of articulator positions from acoustics: new conclusions based on human data. , 1996, The Journal of the Acoustical Society of America.

[3]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[4]  Louis Goldstein,et al.  Towards an articulatory phonology , 1986, Phonology.

[5]  Florian Metze,et al.  A flexible stream architecture for ASR using articulatory features , 2002, INTERSPEECH.

[6]  Atsushi Nakamura,et al.  Production-Oriented Models for Speech Recognition , 2006, IEICE Trans. Inf. Syst..

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Dong Yu,et al.  Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation , 2007, Comput. Speech Lang..

[9]  Hani Camille Yehia,et al.  A study on the speech acoustic-to-articulatory mapping using morphological constraints , 1997 .

[10]  Shrikanth Narayanan,et al.  A generalized smoothness criterion for acoustic-to-articulatory inversion. , 2010, The Journal of the Acoustical Society of America.

[11]  Li Deng,et al.  Target-directed mixture dynamic models for spontaneous speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[12]  Shrikanth S. Narayanan,et al.  A subject-independent acoustic-to-articulatory inversion , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Shrikanth S. Narayanan,et al.  Information Theoretic Analysis of Direct Articulatory Measurements for Phonetic Discrimination , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  Keiichi Tokuda,et al.  Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model , 2008, Speech Commun..

[15]  Li Deng,et al.  Production models as a structural basis for automatic speech recognition , 1997, Speech Commun..

[16]  Li Deng,et al.  Variational inference and learning for segmental switching state space models of hidden speech dynamics , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .