Distant Speech Recognition: Bridging the Gaps

While great progress has been made in both fields, there is currently a relatively large rift between researchers engaged in acoustic array processing and those engaged in automatic speech recognition. This is unfortunate for many reasons, but most of all because it prevents the two sides, both of whom are investigating different aspects of the same problem, from truly understanding one another and cooperating. In many cases, the two sides see each other through the eyes of strangers. If ground breaking progress is to be made in the emerging field of distant speech recognition (DSR), this abysmal state of affairs must change. In this work, we outline five pressing problems in the DSR research field, and we make initial proposals for their solutions. The problems discussed here are by no means the only ones that must be solved in order to construct truly effective DSR systems. Nonetheless, their solution, in our view, will represent significant first steps towards this goal, inasmuch as the solution of each of these problems will require a substantial change in the mind-sets and thought patterns of those engaged in this field of research.

[1]  Stefan Schacht,et al.  To separate speech: a system for recognizing simultaneous speech , 2007, ICML 2007.

[2]  Walter Kellermann,et al.  Blind Source Separation for Convolutive Mixtures: A Unified Treatment , 2004 .

[3]  Harry L. Van Trees,et al.  Optimum Array Processing , 2002 .

[4]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[5]  Scott C. Douglas,et al.  Blind Separation of Acoustic Signals , 2001, Microphone Arrays.

[6]  I. McCowan,et al.  The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[7]  John W. McDonough,et al.  Adaptive Beamforming With a Minimum Mutual Information Criterion , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  J. Skilling,et al.  Algorithms and Applications , 1985 .

[9]  Joerg Bitzer,et al.  Post-Filtering Techniques , 2001, Microphone Arrays.

[10]  Richard M. Stern,et al.  Likelihood-maximizing beamforming for robust hands-free speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[11]  Jean-Francois Cardoso,et al.  Blind signal separation: statistical principles , 1998, Proc. IEEE.

[12]  Christopher V. Alvino,et al.  Geometric source separation: merging convolutive source separation with geometric beamforming , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[13]  M. Wolfel,et al.  A Joint Particle Filter and Multi-Step Linear Prediction Framework to Provide Enhanced Speech Features Prior to Automatic Recognition , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[14]  Dietrich Klakow,et al.  Filter bank design based on minimization of individual aliasing terms for minimum mutual information subband adaptive beamforming , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[16]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[17]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[18]  Tomohiro Nakatani,et al.  Efficient blind dereverberation framework for automatic speech recognition , 2005, INTERSPEECH.

[19]  John W. McDonough,et al.  A cepstral domain maximum likelihod beamformer for speech recognition , 2004, INTERSPEECH.

[20]  Philip N. Garner,et al.  Adaptive Beamforming with a Maximum Negentropy Criterion , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[21]  John McDonough,et al.  Distant Speech Recognition , 2009 .