论文信息 - Rapid adaptation for mobile speech applications

Rapid adaptation for mobile speech applications

We investigate the use of iVector-based rapid adaptation for recognition in mobile speech applications. We show that on this task, the proposed approach has two merits over a linear-transform based approach. First it provides larger error reductions (11% vs. 6%) as it is better suited for the short utterances and varied recording conditions. Second it omits the need for speaker data pooling and/or clustering and the very large infrastructure complexity that accompanies that. Empirical results show that although the proposed utterance-based training algorithm leads to large data fragmentation, the resulting model re-estimation performs well. Our implementation within the MapReduce framework allows processing of the large statistics that this approach gives rise to when applied on a database of thousands of hours.

Michiel Bacchiani | M. Bacchiani

[1] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[2] Michael Picheny,et al. Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[4] Jean-Claude Junqua,et al. Maximum likelihood eigenspace and MLLR for speech recognition in noisy environments , 1999, EUROSPEECH.

[5] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7] Sanjay Ghemawat,et al. MapReduce: simplified data processing on large clusters , 2008, CACM.

[8] Reinhold Häb-Umbach. Automatic generation of phonetic regression class trees for MLLR adaptation , 2001, IEEE Trans. Speech Audio Process..

[9] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[10] Roland Kuhn,et al. Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[11] George Zavaliagkos,et al. Batch, incremental and instantaneous adaptation techniques for speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12] Fernando Pereira,et al. The AT&t 60,000 word speech-to-text system , 1995, EUROSPEECH.

[13] Mark J. F. Gales. Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[14] Patrick Kenny,et al. Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[15] Michael Picheny,et al. Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16] Richard M. Schwartz,et al. A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17] P.C. Woodland,et al. The 1994 HTK large vocabulary speech recognition system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[18] Francoise Beaufays,et al. Google Search by Voice: A Case Study , 2010 .

[19] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20] Vassilios Digalakis,et al. Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[21] Adrian Corduneanu,et al. Correlation modeling of MLLR transform biases for rapid HMM adaptation to new speakers , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).