Unified acoustic modeling for continuous speech recognition

Usually the speech and the silence models are trained together depending upon the type of recognition task. For example, if the recognition task is only on connected-digits then the corresponding digit models are built using only the connected-digit training corpus. Similarly for large-vocabulary recognition tasks, the subword or the phoneme models are generated using only the subword training set. Further the alphabet models are separately trained using the alphabet training data for letter recognition. In certain applications the developer needs to perform mixed-mode operations like alphabet followed by digits, digits succeeded by keywords, letters preceded by keywords etc. So there is a need to robustly design a speech recognizer for such kind of speci c applications. In that context, we propose several acoustic modeling techniques to improve the uni ed model performance for applications that require mixed-mode operations.

[1]  S. Haykin,et al.  Pattern Recognition Using a Family of Design Algorithms Based upon the Generalized Probabilistic Descent Method , 2001 .

[2]  Rathinavelu Chengalvarayan On the use of normalized LPC error towards better large vocabulary speech recognition systems , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[4]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[5]  Malan B. Gandhi,et al.  Natural number recognition using MCE trained inter-word context dependent acoustic models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Biing-Hwang Juang,et al.  A study on task-independent subword selection and modeling for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Li Deng,et al.  Use of generalized dynamic feature parameters for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[8]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition , 1996, IEEE Trans. Speech Audio Process..

[9]  Anand R. Setlur,et al.  Improved spelling recognition using a tree-based fast lexical match , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Andreas Spanias,et al.  High-performance alphabet recognition , 1996, IEEE Trans. Speech Audio Process..

[11]  Yifan Gong,et al.  Speech-enabled information retrieval in the automobile environment , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  Rathinavelu Chengalvarayan,et al.  A comparative study of hybrid modelling techniques for improved telephone speech recognition , 1998, ICSLP.