Improving the robustness of phonetic segmentation to accent and style variation with a two-staged approach

Correct and temporally accurate phonetic segmentation of speech utterances is important in applications ranging from transcription alignment to pronunciation error detection. Automatic speech recognizers used in these tasks provide insufficient temporal alignment accuracy apart from a recognition performance that is sensitive to accent and style variations from the training data. A two-staged approach combining HMM broad-class recognition with acousticphonetic knowledge based refinement is evaluated for phonetic segmentation accuracy in the context of accent and style mismatches with training data.

[1]  Hong Kook Kim,et al.  Acoustic Model Adaptation Based on Pronunciation Variability Analysis for Non-Native Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  K Samudravijaya,et al.  A feature-based hierarchical speech recognition system for Hindi , 1998 .

[3]  Abeer Alwan,et al.  Automatic detection of voice onset time contrasts for use in pronunciation assessment , 2006, INTERSPEECH.

[4]  Sharlene A. Liu,et al.  Landmark detection for distinctive feature-based speech recognition , 1996 .

[5]  Ariel Salomon,et al.  Detection of speech landmarks: use of temporal information. , 2004, The Journal of the Acoustical Society of America.

[6]  Helmer Strik,et al.  Comparing classifiers for pronunciation error detection , 2007, INTERSPEECH.

[7]  Zhigang Cao,et al.  Refining segmental boundaries for TTS database using fine contextual-dependent boundary models , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Bayya Yegnanarayana,et al.  A robust method for determining instants of major excitations in voiced speech , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  S. R. Mahadeva Prasanna,et al.  SIGNIFICANCE OF VOWEL ONSET POINT FOR SPEECH ANALYSIS , 2001 .

[10]  Jean-Pierre Martens,et al.  On The Use of Phonological Features for Pronunciation Scoring , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Richard Sproat,et al.  High-accuracy automatic segmentation , 1999, EUROSPEECH.

[12]  Jordi Bonada,et al.  PERFORMANCE ANALYSIS AND SCORING OF THE SINGING VOICE , 2009 .

[13]  Shrikanth S. Narayanan,et al.  Refined speech segmentation for concatenative speech synthesis , 2002, INTERSPEECH.

[14]  P. V. S. Rao,et al.  Hindi speech database , 2000, INTERSPEECH.

[15]  Victor Zue,et al.  A model of lexical access from partial phonetic information , 1984, ICASSP.