Improving automatic speech recognition robustness for the Romanian language

In this paper we propose an alternative way of improving speech recognition accuracy by analyzing the relevance of voice feature dimensions. A new measure is defined in order to quantify the feature distribution overlapping. Based on this measure, weights for voice feature dimensions are calculated and then applied to the hypotheses resulted from an N-best recognition process. Experiments are made with an Automatic Speech Recognition (ASR) system for the Romanian language. A relative improvement of 22% is obtained in terms of Word Error Rate (WER).

[1]  Jinbo Bi,et al.  Support Vector Classification with Input Data Uncertainty , 2004, NIPS.

[2]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[3]  Mark J. F. Gales,et al.  Joint uncertainty decoding for noise robust speech recognition , 2005, INTERSPEECH.

[4]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Horia Cucu,et al.  Speech Recognition Experimental Results for Romanian Language , 2013 .

[6]  Hoirin Kim,et al.  Utterance verification using search confusion rate and its N-best approach , 2005 .

[7]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[8]  Joseph Picone,et al.  A sparse modeling approach to speech recognition based on relevance vector machines , 2002, INTERSPEECH.

[9]  Volker Steinbiss,et al.  A word graph based N-best search in continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Steve Young,et al.  The HTK book , 1995 .

[11]  Taro Watanabe,et al.  Improved spoken language translation using n-best speech recognition hypotheses , 2004, INTERSPEECH.

[12]  Lin-Shan Lee,et al.  Entropy-Based Feature Parameter Weighting for Robust Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Steve Young,et al.  The HTK book version 3.4 , 2006 .