论文信息 - Making an Effective Use of Speech Data for Acoustic Modeling

Making an Effective Use of Speech Data for Acoustic Modeling

.........................................................................................................I Acknowledgements..........................................................................................III Table of

Rong Zhang | Rong Zhang

[1] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[2] Harry Shum,et al. Learning to boost GMM based speaker verification , 2003, INTERSPEECH.

[3] Lin Lawrance Chase. Error-responsive feedback mechanisms for speech recognizers , 1997 .

[4] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[5] Hervé Bourlard,et al. From Multi-Band Full Combination to Multi-Stream Full Combination Processing in Robust ASR , 2000 .

[6] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[7] Rong Zhang,et al. Word level confidence annotation using combinations of features , 2001, INTERSPEECH.

[8] Mei-Yuh Hwang,et al. The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[9] Lalit R. Bahl,et al. Further results on the recognition of a continuously read natural corpus , 1980, ICASSP.

[10] Raj Reddy,et al. Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[11] Ronald Rosenfeld,et al. Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[12] Mei Hwang. Subphonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition , 2001 .

[13] Fabio Gagliardi Cozman,et al. Semi-Supervised Learning of Mixture Models and Bayesian Networks , 2003 .

[14] Rong Zhang,et al. Comparative study of boosting and non-boosting training for constructing ensembles of acoustic models , 2003, INTERSPEECH.

[15] Alexander I. Rudnicky,et al. Investigations on ensemble based semi-supervised acoustic model training , 2005, INTERSPEECH.

[16] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[17] R. Rosenfeld,et al. Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[18] Jean-Luc Gauvain,et al. Lightly supervised acoustic model training using consensus networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19] Vaibhava Goel,et al. Segmental minimum Bayes-risk decoding for automatic speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[20] Biing-Hwang Juang,et al. Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[21] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[22] Holger Schwenk,et al. Using boosting to improve a hybrid HMM/neural network speech recognizer , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[23] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[24] Tanja Schultz,et al. Speaker segmentation and clustering in meetings , 2004, INTERSPEECH.

[25] Rong Zhang,et al. Apply n-best list re-ranking to acoustic model combinations of boosting training , 2004, INTERSPEECH.

[26] Alexander H. Waibel,et al. Unsupervised training of a speech recognizer: recent experiments , 1999, EUROSPEECH.

[27] Thomas G. Dietterich,et al. Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[28] Salvatore J. Stolfo,et al. Speech Recognition in Parallel , 1989, HLT.

[29] Robert P. W. Duin,et al. Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.

[30] Thomas G. Dietterich. Machine-Learning Research Four Current Directions , 1997 .

[31] Thomas G. Dietterich. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[32] Gökhan Tür,et al. Extending boosting for call classification using word confusion networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33] Tin Kam Ho,et al. The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[34] Ludmila I. Kuncheva,et al. Relationships between combination methods and measures of diversity in combining classifiers , 2002, Inf. Fusion.

[35] Yoram Singer,et al. Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[36] João Paulo da Silva Neto,et al. Combination of acoustic models in continuous speech recognition hybrid systems , 2000, INTERSPEECH.

[37] Lie Lu,et al. Speech segmentation without speech recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[38] Jean-Luc Gauvain,et al. Combining multiple speech recognizers using voting and language model information , 2000, INTERSPEECH.

[39] Bhiksha Raj,et al. A boosting approach for confidence scoring , 2001, INTERSPEECH.

[40] Ralf Schlüter,et al. Using word probabilities as confidence measures , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[41] Charles W. Therrien,et al. Discrete Random Signals and Statistical Signal Processing , 1992 .

[42] Heidi Christensen,et al. Employing heterogeneous information in a multi-stream framework , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[43] Say Wei Foo,et al. Speaker recognition using adaptively boosted decision tree classifier , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44] Christophe Ambroise,et al. Semi-supervised MarginBoost , 2001, NIPS.

[45] Richard M. Stern,et al. Speech in Noisy Environments: robust automatic segmentation, feature extraction, and hypothesis combination , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[46] Jean-Luc Gauvain,et al. Unsupervised acoustic model training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47] Harvey b. Fletcher,et al. Speech and hearing in communication , 1953 .

[48] Hervé Bourlard,et al. Subband-based speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[49] Andreas Stolcke,et al. The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[50] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[51] William J. Byrne,et al. Minimum risk acoustic clustering for multilingual acoustic model combination , 2000, INTERSPEECH.

[52] Richard M. Stern,et al. LATTICE COMBINATION FOR IMPROVED SPEECH RECOGNITON , 2001 .

[53] Alexander G. Hauptmann,et al. Improving acoustic models with captioned multimedia speech , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[54] Robert E. Schapire,et al. A Brief Introduction to Boosting , 1999, IJCAI.

[55] Katrin Weber. Multiple Timescale Feature Combination Towards Robust Speech Recognition , 2000, KONVENS.

[56] David E. Reynolds,et al. Automatic segmentation , 1986 .

[57] Vaibhava Goel,et al. Task adaptation of acoustic and language models based on large quantities of data , 2004, INTERSPEECH.

[58] Jeff A. Bilmes,et al. COMBINATION AND JOINT TRAINING OF ACOUSTIC CLASSIFIERS FOR SPEECH RECOGNITION , 2000 .

[59] Sangita R. Sharma,et al. Multi-stream approach to robust speech recognition , 1999 .

[60] Hervé Bourlard,et al. Adaptive ML-weighting in multi-band recombination of Gaussian mixture ASR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[61] Alexander I. Rudnicky,et al. Creating natural dialogs in the carnegie mellon communicator system , 1999, EUROSPEECH.

[62] Gerard G. L. Meyer,et al. Automatic selection of transcribed training material , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[63] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .

[64] Daniel P. W. Ellis,et al. Multi-stream speech recognition: ready for prime time? , 1999, EUROSPEECH.

[65] Hervé Bourlard,et al. Error correcting posterior combination for robust multi-band speech recognition , 2001, INTERSPEECH.

[66] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[67] Vaibhava Goel,et al. Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..

[68] Vaibhava Goel,et al. Segmental minimum Bayes-risk ASR voting strategies , 2000, INTERSPEECH.

[69] Rong Zhang,et al. Improving the performance of an LVCSR system through ensembles of acoustic models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[70] Rong Zhang,et al. A frame level boosting training scheme for acoustic modeling , 2004, INTERSPEECH.

[71] C. Ris,et al. Multi-band with contaminated training data , 2001 .

[72] Gernot A. Fink,et al. Conversational speech recognition using acoustic and articulatory input , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[73] Hynek Hermansky,et al. Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[74] Hermann Ney,et al. Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[75] Wei Fan,et al. Bagging , 2009, Encyclopedia of Machine Learning.

[76] Steve R. Waterhouse,et al. Ensemble methods for connectionist acoustic modelling , 1997, EUROSPEECH.

[77] Thomas G. Dietterich. Machine-Learning Research , 1997, AI Mag..

[78] Samy Bengio,et al. Boosting word error rates , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[79] Steven Greenberg,et al. Incorporating information from syllable-length time scales into automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[80] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[81] Alexander I. Rudnicky,et al. Creating Multi-Modal, User-Centric Records of Meetings with the Carnegie Mellon Meeting Recorder Architecture , 2004 .

[82] L. Baum,et al. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[83] Hermann Ney,et al. Comparison of discriminative training criteria and optimization methods for speech recognition , 2001, Speech Commun..

[84] M. L. Shire,et al. Discriminant Training of Front-End and Acoustic Modeling Stages to Heterogeneous Acoustic Environmen , 2000 .

[85] Chin-Hui Lee,et al. A hybrid algorithm for speaker adaptation using MAP transformation and adaptation , 1997, IEEE Signal Processing Letters.

[86] Hynek Hermansky,et al. Towards subband-based speech recognition , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[87] Samy Bengio,et al. Boosting HMMs with an application to speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[88] Rong Zhang,et al. Investigations of issues for using multiple acoustic models to improve continuous speech recognition , 2006, INTERSPEECH.

[89] Hermann Ney,et al. Acoustic feature combination for robust speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[90] Hermann Ney,et al. Investigations on error minimizing training criteria for discriminative training in automatic speech recognition , 2005, INTERSPEECH.

[91] Spyridon Matsoukas,et al. Unsupervised Training on a Large Amount of Arabic Broadcast News Data , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[92] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[93] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[94] Shivani Agarwal,et al. An Experimental Study of EM-Based Algorithms for Semi-Supervised Learning in Audio Classification , 2003 .

[95] G. Tur,et al. Model adaptation for spoken language understanding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[96] Marilyn A. Walker,et al. Learning to personalize spoken generation for dialogue systems , 2005, INTERSPEECH.

[97] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[98] L. Breiman. Arcing Classifiers , 1998 .

[99] Ludmila I. Kuncheva,et al. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[100] M. A. Siegler,et al. Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[101] Carsten Meyer. Utterance-level boosting of HMM speech recognizers , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[102] Ralf Schlüter,et al. Investigations on discriminative training criteria , 2000 .

[103] J. Thompson,et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[104] Dilek Z. Hakkani-Tür,et al. Using context to improve emotion detection in spoken dialog systems , 2005, INTERSPEECH.

[105] Srinivas Bangalore,et al. Combining prior knowledge and boosting for call classification in spoken language dialogue , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[106] Bing Xiang,et al. Light supervision in acoustic model training , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[107] Ayhan Demiriz,et al. Exploiting unlabeled data in ensemble methods , 2002, KDD.

[108] S. Katagiri,et al. Discriminative Learning for Minimum Error Classification , 2009 .

[109] Geoffrey Zweig,et al. Boosting Gaussian mixtures in an LVCSR system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[110] Chin-Hui Lee,et al. Combination of boosting and discriminative training for natural language call steering systems , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[111] Dilek Z. Hakkani-Tür,et al. Active and unsupervised learning for automatic speech recognition , 2003, INTERSPEECH.

[112] Hervé Bourlard,et al. Using multiple time scales in the framework of multi-stream speech recognition , 2000, INTERSPEECH.

[113] Brian Kingsbury,et al. Constructing ensembles of ASR systems using randomized decision trees , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[114] Rong Zhang,et al. Is this conversation on track? , 2001, INTERSPEECH.

[115] Hervé Bourlard,et al. Subband-Based Speech Recognition in Noisy Conditions: The Full Combination Approach , 1998 .

[116] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[117] Mosur Ravishankar,et al. Efficient Algorithms for Speech Recognition. , 1996 .

[118] Robert P. W. Duin,et al. An experimental study on diversity for bagging and boosting with linear classifiers , 2002, Inf. Fusion.

[119] Anthony J. Robinson,et al. Boosting the performance of connectionist large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[120] Marilyn A. Walker,et al. A trainable generator for recommendations in multimodal dialog , 2003, INTERSPEECH.

[121] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[122] Jean-Luc Gauvain,et al. Lightly Supervised Acoustic Model Training , 2000 .

[123] Oh-Wook Kwon,et al. Optimizing speech/non-speech classifier design using AdaBoost , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[124] Anthony J. Robinson,et al. Utterance clustering for large vocabulary continuous speech recognition , 1995, EUROSPEECH.

[125] Ziyou Xiong,et al. Boosting Speech/Non-speech Classification Using Averaged Mel-Frequency Cepstrum Coefficients Features , 2002, IEEE Pacific Rim Conference on Multimedia.

[126] Sebastian Thrun,et al. Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[127] Alexander G. Hauptmann,et al. Learning to Recognize Speech by Watching Television , 1999, IEEE Intell. Syst..

[128] Sadaoki Furui,et al. Stream-weight optimization by LDA and adaboost for multi-stream speaker verification , 2005, INTERSPEECH.

[129] Rong Zhang,et al. A New Data Selection Approach for Semi-Supervised Acoustic Modeling , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[130] Hermann Ney,et al. Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[131] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..