CU-Move: Advanced In-Vehicle Speech Systems for Route Navigation

In this chapter, we present our recent advances in the formulation and development of an in-vehicle hands-free route navigation system. The system is comprised of a multi-microphone array processing front-end, environmental sniffer (for noise analysis), robust speech recognition system, and dialog manager and information servers. We also present our recently completed speech corpus for in-vehicle interactive speech systems for route planning and navigation. The corpus consists of five domains which include: digit strings, route navigation expressions, street and location sentences, phonetically balanced sentences, and a route navigation dialog in a human Wizard-of-Oz like scenario. A total of 500 speakers were collected from across the United States of America during a six month period from April-Sept. 2001. While previous attempts at in-vehicle speech systems have generally focused on isolated command words to set radio frequencies, temperature control, etc., the CU-Move system is focused on natural conversational interaction between the user and in-vehicle system. After presenting our proposed in-vehicle speech system, we consider advances in multi-channel array processing, environmental noise sniffing and tracking, new and more robust acoustic front-end representations and built-in speaker normalization for robust ASR, and our back-end dialog navigation information retrieval sub-system connected to the WWW. Results are presented in each sub-section with a discussion at the end of the chapter.

[1]  Alexander Fischer,et al.  Database and online adaptation for improved speech recognition in car environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  William J. Byrne,et al.  Speaker normalization with all-pass transforms , 1998, ICSLP.

[3]  Liang Gu,et al.  Perceptual harmonic cepstral coefficients as the front-end for speech recognition , 2000, INTERSPEECH.

[4]  W. M. Carey,et al.  Digital spectral analysis: with applications , 1986 .

[5]  Olli Viikki,et al.  Low complexity speaker independent command word recognition in car environments , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Neil J. Bershad,et al.  Comments on "Time delay estimation using the LMS adaptive filter-static behavior" , 1985, IEEE Trans. Acoust. Speech Signal Process..

[7]  Alexander H. Waibel,et al.  Towards spontaneous speech recognition for on-board car navigation and information systems , 1999, EUROSPEECH.

[8]  N. Bershad,et al.  Time delay estimation using the LMS adaptive filter--Dynamic behavior , 1981 .

[9]  John H. L. Hansen,et al.  Perceptual MVDR-based cepstral coefficients (PMCCs) for high accuracy speech recognition , 2003, INTERSPEECH.

[10]  John H. L. Hansen,et al.  Constrained iterative speech enhancement with application to speech recognition , 1991, IEEE Trans. Signal Process..

[11]  John H. L. Hansen,et al.  CSA-BF: a constrained switched adaptive beamformer for speech enhancement and recognition in real car environments , 2003, IEEE Trans. Speech Audio Process..

[12]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[13]  Keiichi Tokuda,et al.  Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.

[14]  Eliathamby Ambikairajah,et al.  Wavelet transform-based speech enhancement , 1998, ICSLP.

[15]  John H. L. Hansen,et al.  High performance digit recognition in real car environments , 2002, INTERSPEECH.

[16]  Jun Huang,et al.  A DCT-based fast enhancement technique for robust speech recognition in automobile usage , 1999, EUROSPEECH.

[17]  John H. L. Hansen,et al.  Environmental sniffing: robust digit recognition for an in-vehicle environment , 2003, INTERSPEECH.

[18]  H L HansenJohn Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996 .

[19]  Alexander Fischer,et al.  Domain adaptation for robust automatic speech recognition in car environments , 1999, EUROSPEECH.

[20]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[21]  John H. L. Hansen,et al.  University of Colorado Dialogue Systems for Travel and Navigation , 2001, HLT.

[22]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[23]  Jean-Pierre Adoul,et al.  Frequency-domain spectral envelope estimation for low rate coding of speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[24]  Jean-Claude Junqua,et al.  Techniques for robust speech recognition in the car environment , 1999, EUROSPEECH.

[25]  Pavel Sovka,et al.  Czech language database of car speech and environmental noise , 1999, EUROSPEECH.

[26]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[27]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[28]  Bhaskar D. Rao,et al.  MVDR based feature extraction for robust speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[29]  John H. L. Hansen,et al.  A new perspective on feature extraction for robust in-vehicle speech recognition , 2003, INTERSPEECH.

[30]  Wayne H. Ward,et al.  The CU communicator: an architecture for dialogue systems , 2000, INTERSPEECH.

[31]  Roland Reagan THE CU COMMUNICATOR SYSTEM , 1998 .

[32]  Bhaskar D. Rao,et al.  All-pole modeling of speech based on the minimum variance distortionless response spectrum , 2000, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[33]  John H. L. Hansen,et al.  An improved (Auto: I, LSP: T) constrained iterative speech enhancement for colored noise environments , 1998, IEEE Trans. Speech Audio Process..

[34]  Alexander H. Waibel,et al.  Conversational speech systems for on-board car navigation and assistance , 1998, ICSLP.

[35]  Chung-Ho Yang,et al.  A novel approach to robust speech endpoint detection in car environments , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[36]  John H. L. Hansen,et al.  CSA-BF: novel constrained switched adaptive beamforming for speech enhancement & recognition in real car environments , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[37]  Melvyn J. Hunt,et al.  Spectral Signal Processing for ASR , 2007 .

[38]  Kadri Hacioglu,et al.  Recent improvements in the CU Sonic ASR system for noisy speech: the SPINE task , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[39]  John H. L. Hansen,et al.  The Impact of Speech Under `Stress''on Military Speech Technology , 2000 .

[40]  Satya Dharanipragada,et al.  Perceptual MVDR-based cepstral coefficients (PMCCs) for robust speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[41]  John H. L. Hansen,et al.  Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems , 2003, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Wayne H. Ward,et al.  THE CU COMMUNICATOR SYSTEM 1 , 1999 .