Opportunities and challenges of parallelizing speech recognition

Automatic speech recognition enables a wide range of current and emerging applications such as automatic transcription, multimedia content analysis, and natural human-computer interfaces. This article provides a glimpse of the opportunities and challenges that parallelism provides for automatic speech recognition and related application research from the point of view of speech researchers. The increasing parallelism in computing platforms opens three major possibilities for speech recognition systems: improving recognition accuracy in non-ideal, everyday noisy environments; increasing recognition throughput in batch processing of speech data; and reducing recognition latency in real-time usage scenarios. We describe technical challenges, approaches we've taken, and possible directions for future research to guide the design of efficient parallel software and hardware infrastructures.

[1]  Anne Rogers,et al.  Parallel Speech Recognition , 2004, International Journal of Parallel Programming.

[2]  Pierre Dumouchel,et al.  GPU accelerated acoustic likelihood computations , 2008, INTERSPEECH.

[3]  StateStart StateFinalFigure Parallel Implementation of Fast Beam Search for Speaker-independent Continuous Speech Recognition , 1993 .

[4]  Kurt Keutzer,et al.  A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit , 2009, INTERSPEECH.

[5]  Jonathan Z. Simon,et al.  Robust Spectrotemporal Reverse Correlation for the Auditory System: Optimizing Stimulus Design , 2000, Journal of Computational Neuroscience.

[6]  Ryosuke Isotani,et al.  Parallel LVCSR Algorithm for Cellphone-Oriented Multicore Processors , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[8]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[9]  Andreas Stolcke,et al.  Building an ASR system for noisy environments: SRI's 2001 SPINE evaluation system , 2002, INTERSPEECH.

[10]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Hervé Bourlard,et al.  Multi-Stream Speech Recognition , 1996 .

[12]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[13]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[14]  Nelson Morgan,et al.  Multi-stream to many-stream: using spectro-temporal features for ASR , 2009, INTERSPEECH.

[15]  Jont B. Allen,et al.  How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..

[16]  Xavier Anguera Miró,et al.  Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System , 2005, MLMI.

[17]  Hermann Ney,et al.  Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..

[18]  Christian A. Müller,et al.  A fast-match approach for robust, faster than real-time speaker diarization , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[19]  Martin Vetterli,et al.  Proc. of IEEE Conf. on Acoustics, Speech and Signal Processing, ICASSP , 2002 .

[20]  Sadaoki Furui,et al.  Fast acoustic computations using graphics processors , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Eric Shade,et al.  Ready for prime time , 2002 .

[22]  Wonyong Sung,et al.  OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.