High performance digit recognition in real car environments

In this paper, we consider the problem of robust digit recognition in real car environments. We choose to utilize newlycollected CU-Move database [2]. We address the problem using two integrated approaches . First, we consider array processing, enhancement and noise adaptation techniques as an integrated solution. This approach reduced the word error rate (WER) 38.6% and increased word accuracy (WAC) 47.1%, relative to baseline results. Secondly, we use array processing, enhancement, cepstral mean normalization, vocal tract length normalization and MLLR adaptation as an alternative solution. The net gain obtained with this solution is 55.4% reduction in WER and 64.3% increase in WAC, relative to baseline results. The first approach has the advantage of speed since all operations can be performed in real-time, while the second approach maintains high accuracy at the cost of increased computational requirements.

[1]  Wayne H. Ward,et al.  A word graph interface for a flexible concept based speech understanding framework , 2001, INTERSPEECH.

[2]  Alexander Fischer,et al.  Database and online adaptation for improved speech recognition in car environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Jun Huang,et al.  A DCT-based fast enhancement technique for robust speech recognition in automobile usage , 1999, EUROSPEECH.

[4]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[5]  John H. L. Hansen,et al.  PCA-PMC: a novel use of a priori knowledge for fast parallel model combination , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Satoshi Takahashi,et al.  Jacobian approach to fast acoustic model adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  John H. L. Hansen,et al.  Robust speech recognition in noise: an evaluation using the SPINE corpus , 2001, INTERSPEECH.

[8]  Philip C. Woodland,et al.  An investigation into vocal tract length normalisation , 1999, EUROSPEECH.

[9]  John H. L. Hansen,et al.  "CU-move" : analysis & corpus development for interactive in-vehicle speech systems , 2001, INTERSPEECH.

[10]  John H. L. Hansen,et al.  A novel algorithm for rapid speaker adaptation based on structural maximum likelihood eigenspace mapping , 2001, INTERSPEECH.

[11]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[12]  Olli Viikki,et al.  Low complexity speaker independent command word recognition in car environments , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).