论文信息 - Data sampling for improved speech recognizer training

Data sampling for improved speech recognizer training

Proper data selection for training a speech recognizer can be important for reducing costs of developing systems on new tasks and exploratory experiments, but it is also useful for efficient leveraging of the increasingly large speech resources available for training large vocabulary systems. In this work, we investigate various sampling methods, comparing the likelihood criterion to new acoustic measures motivated by work in child language acquisition. The acoustic criteria can be used with or without pre-existing transcriptions or models. When applied to the problem of selecting a small training set, the best results are obtained using modulation spectrum features and a discriminant function trained on child vs. adult-directed speech. For large corpora, none of the methods outperforms random sampling, but reduced training costs are obtained by using multistage training and initializing with the small corpus.

Les E. Atlas | Mari Ostendorf | Takahiro Shinozaki

[1] Gerard G. L. Meyer,et al. Selective sampling of training data for speech recognition , 2002 .

[2] Andreas Stolcke,et al. THE SRI MARCH 2000 HUB-5 CONVERSATIONAL SPEECH TRANSCRIPTION SYSTEM , 2000 .

[3] Shona Douglas. Active Learning for Classifying Phone Sequences from Unsupervised Phonotactic Models , 2003, HLT-NAACL.

[4] Geoffrey Zweig,et al. Boosting Gaussian mixtures in an LVCSR system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5] N. Morgan,et al. A CTS TASK FOR MEANINGFUL FAST-TURNAROUND EXPERIMENTS , 2015 .

[6] Les E. Atlas,et al. EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .

[7] P. Kuhl,et al. Cross-language analysis of phonetic units in language addressed to infants. , 1997, Science.