Rapid adaptation for mobile speech applications

We investigate the use of iVector-based rapid adaptation for recognition in mobile speech applications. We show that on this task, the proposed approach has two merits over a linear-transform based approach. First it provides larger error reductions (11% vs. 6%) as it is better suited for the short utterances and varied recording conditions. Second it omits the need for speaker data pooling and/or clustering and the very large infrastructure complexity that accompanies that. Empirical results show that although the proposed utterance-based training algorithm leads to large data fragmentation, the resulting model re-estimation performs well. Our implementation within the MapReduce framework allows processing of the large statistics that this approach gives rise to when applied on a database of thousands of hours.

[1]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[2]  Michael Picheny,et al.  Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[4]  Jean-Claude Junqua,et al.  Maximum likelihood eigenspace and MLLR for speech recognition in noisy environments , 1999, EUROSPEECH.

[5]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[8]  Reinhold Häb-Umbach Automatic generation of phonetic regression class trees for MLLR adaptation , 2001, IEEE Trans. Speech Audio Process..

[9]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[10]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[11]  George Zavaliagkos,et al.  Batch, incremental and instantaneous adaptation techniques for speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Fernando Pereira,et al.  The AT&t 60,000 word speech-to-text system , 1995, EUROSPEECH.

[13]  Mark J. F. Gales Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[14]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[15]  Michael Picheny,et al.  Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  P.C. Woodland,et al.  The 1994 HTK large vocabulary speech recognition system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[18]  Francoise Beaufays,et al.  Google Search by Voice: A Case Study , 2010 .

[19]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[21]  Adrian Corduneanu,et al.  Correlation modeling of MLLR transform biases for rapid HMM adaptation to new speakers , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).