Analysis of CFA-BF: Novel combined fixed/adaptive beamforming for robust speech recognition in real car environments

Among a number of studies which have investigated various speech enhancement and processing schemes for in-vehicle speech systems, the delay-and-sum beamforming (DASB) and adaptive beamforming are two typical methods that both have their advantages and disadvantages. In this paper, we propose a novel combined fixed/adaptive beamforming solution (CFA-BF) based on previous work for speech enhancement and recognition in real moving car environments, which seeks to take advantage of both methods. The working scheme of CFA-BF consists of two steps: source location calibration and target signal enhancement. The first step is to pre-record the transfer functions between the speaker and microphone array from different potential source positions using adaptive beamforming under quiet environments; and the second step is to use this pre-recorded information to enhance the desired speech when the car is running on the road. An evaluation using extensive actual car speech data from the CU-Move Corpus shows that the method can decrease WER for speech recognition by up to 30% over a single channel scenario and improve speech quality via the SEGSNR measure by up to 1dB on the average.

[1]  J. Capon High-resolution frequency-wavenumber spectrum analysis , 1969 .

[2]  John H. L. Hansen,et al.  Combined front-end signal processing for in-vehicle speech systems , 2001, INTERSPEECH.

[3]  Jack Perkins,et al.  Pattern recognition in practice , 1980 .

[4]  Arun Ross,et al.  Microphone Arrays , 2009, Encyclopedia of Biometrics.

[5]  Saeed Gazor,et al.  Optimal positioning of sensors for a microphone array , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  John H. L. Hansen,et al.  Constrained iterative speech enhancement with application to speech recognition , 1991, IEEE Trans. Signal Process..

[7]  John H. L. Hansen,et al.  CSA-BF: a constrained switched adaptive beamformer for speech enhancement and recognition in real car environments , 2003, IEEE Trans. Speech Audio Process..

[8]  Bouchra Senadji,et al.  Broadband source localization by regularization techniques , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Satoshi Nakamura,et al.  Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array , 2002, IEEE Trans. Speech Audio Process..

[10]  O. Hoshuyama,et al.  A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[12]  James F. Kaiser,et al.  Some useful properties of Teager's energy operators , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[14]  John H. L. Hansen,et al.  Speech enhancement using a constrained iterative sinusoidal model , 2001, IEEE Trans. Speech Audio Process..

[15]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[16]  Dirk Van Compernolle,et al.  Speech recognition in noisy environments with the aid of microphone arrays , 1989, Speech Commun..

[17]  Kung Yao,et al.  Comparison of microphone array designs for hearing aid , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[18]  Sven Nordholm,et al.  Filter bank design for subband adaptive microphone arrays , 2003, IEEE Trans. Speech Audio Process..

[19]  Dirk Van Compernolle,et al.  Switching adaptive filters for enhancing noisy and reverberant speech from microphone array recordings , 1990, ICASSP.

[20]  Yves Grenier A microphone array for car environments , 1993, Speech Commun..

[21]  Kazuya Takeda,et al.  Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones , 2005, IEICE Trans. Inf. Syst..

[22]  J.H.L. Hansen,et al.  Dual-channel iterative speech enhancement with constraints on an auditory-based spectrum , 1995, IEEE Trans. Speech Audio Process..

[23]  Vishu R. Viswanathan,et al.  Hands-free voice communication in an automobile with a microphone array , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Alexander Kovtonyuk,et al.  Investigation of effectiveness of microphone arrays for in car use based on sound field simulation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[25]  Sven Nordholm,et al.  Adaptive microphone array employing calibration signals: an analytical evaluation , 1999, IEEE Trans. Speech Audio Process..

[26]  John H. L. Hansen,et al.  Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect , 1994, IEEE Trans. Speech Audio Process..

[27]  Don H. Johnson,et al.  Array Signal Processing: Concepts and Techniques , 1993 .

[28]  John H. L. Hansen,et al.  "CU-move": robust speech processing for in-vehicle speech systems , 2000, INTERSPEECH.

[29]  Jörg Meyer,et al.  Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  John H. L. Hansen,et al.  High performance digit recognition in real car environments , 2002, INTERSPEECH.

[31]  Saeed Gazor,et al.  Criteria for positioning of sensors for a microphone array , 1995, IEEE Trans. Speech Audio Process..

[32]  John H. L. Hansen,et al.  "CU-move" : analysis & corpus development for interactive in-vehicle speech systems , 2001, INTERSPEECH.

[33]  Maurizio Omologo,et al.  Experiments of speech recognition in a noisy and reverberant environment using a microphone array and HMM adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[34]  Maurizio Omologo,et al.  Acoustic source location in a three-dimensional space using crosspower spectrum phase , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[36]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[37]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[38]  H. Abut,et al.  Noise Suppression based on the interior acoustic field of the Vehicular Chamber , 1998 .

[39]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[40]  Te-Won Lee,et al.  A Spatio-Temporal Speech Enhance Speech Recogn , 2002 .

[41]  John H. L. Hansen,et al.  An improved (Auto: I, LSP: T) constrained iterative speech enhancement for colored noise environments , 1998, IEEE Trans. Speech Audio Process..

[42]  John H. L. Hansen,et al.  CFA-BF: a novel combined fixed/adaptive beamforming for robust speech recognition in real car environments , 2003, INTERSPEECH.

[43]  O. L. Frost,et al.  An algorithm for linearly constrained adaptive array processing , 1972 .

[44]  Maurizio Omologo,et al.  Acoustic source location in noisy and reverberant environment using CSP analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[45]  J. S. Bird,et al.  Speech enhancement for mobile telephony , 1990 .

[46]  S. Lang,et al.  Frequency estimation with maximum entropy spectral estimators , 1980 .

[47]  Rafik Goubran,et al.  Noise cancellation using parallel adaptive filters , 1992 .

[48]  Hsiao-Chuan Wang,et al.  Speech classification embedded in adaptive codebook search for low bit-rate CELP coding , 1995, IEEE Trans. Speech Audio Process..

[49]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[50]  Kazuya Takeda,et al.  Adaptive log-spectral regression for in-car speech recognition using multiple distributed microphones , 2005, IEEE Signal Processing Letters.

[51]  Maurizio Omologo,et al.  Acoustic event localization using a crosspower-spectrum phase based technique , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[52]  David G. Long,et al.  Array signal processing , 1985, IEEE Trans. Acoust. Speech Signal Process..

[53]  Neil J. Bershad,et al.  Comments on "Time delay estimation using the LMS adaptive filter-static behavior" , 1985, IEEE Trans. Acoust. Speech Signal Process..

[54]  N. Bershad,et al.  Time delay estimation using the LMS adaptive filter--Dynamic behavior , 1981 .

[55]  J H Hansen,et al.  Objective speech quality assessment and the RPE-LTP coding algorithm in different noise and language conditions. , 1995, The Journal of the Acoustical Society of America.