Making an Effective Use of Speech Data for Acoustic Modeling

.........................................................................................................I Acknowledgements..........................................................................................III Table of

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Harry Shum,et al.  Learning to boost GMM based speaker verification , 2003, INTERSPEECH.

[3]  Lin Lawrance Chase Error-responsive feedback mechanisms for speech recognizers , 1997 .

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  Hervé Bourlard,et al.  From Multi-Band Full Combination to Multi-Stream Full Combination Processing in Robust ASR , 2000 .

[6]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[7]  Rong Zhang,et al.  Word level confidence annotation using combinations of features , 2001, INTERSPEECH.

[8]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[9]  Lalit R. Bahl,et al.  Further results on the recognition of a continuously read natural corpus , 1980, ICASSP.

[10]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[11]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[12]  Mei Hwang Subphonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition , 2001 .

[13]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models and Bayesian Networks , 2003 .

[14]  Rong Zhang,et al.  Comparative study of boosting and non-boosting training for constructing ensembles of acoustic models , 2003, INTERSPEECH.

[15]  Alexander I. Rudnicky,et al.  Investigations on ensemble based semi-supervised acoustic model training , 2005, INTERSPEECH.

[16]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[17]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[18]  Jean-Luc Gauvain,et al.  Lightly supervised acoustic model training using consensus networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Vaibhava Goel,et al.  Segmental minimum Bayes-risk decoding for automatic speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[20]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[21]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[22]  Holger Schwenk,et al.  Using boosting to improve a hybrid HMM/neural network speech recognizer , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[23]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[24]  Tanja Schultz,et al.  Speaker segmentation and clustering in meetings , 2004, INTERSPEECH.

[25]  Rong Zhang,et al.  Apply n-best list re-ranking to acoustic model combinations of boosting training , 2004, INTERSPEECH.

[26]  Alexander H. Waibel,et al.  Unsupervised training of a speech recognizer: recent experiments , 1999, EUROSPEECH.

[27]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[28]  Salvatore J. Stolfo,et al.  Speech Recognition in Parallel , 1989, HLT.

[29]  Robert P. W. Duin,et al.  Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.

[30]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[31]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[32]  Gökhan Tür,et al.  Extending boosting for call classification using word confusion networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Ludmila I. Kuncheva,et al.  Relationships between combination methods and measures of diversity in combining classifiers , 2002, Inf. Fusion.

[35]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[36]  João Paulo da Silva Neto,et al.  Combination of acoustic models in continuous speech recognition hybrid systems , 2000, INTERSPEECH.

[37]  Lie Lu,et al.  Speech segmentation without speech recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[38]  Jean-Luc Gauvain,et al.  Combining multiple speech recognizers using voting and language model information , 2000, INTERSPEECH.

[39]  Bhiksha Raj,et al.  A boosting approach for confidence scoring , 2001, INTERSPEECH.

[40]  Ralf Schlüter,et al.  Using word probabilities as confidence measures , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[41]  Charles W. Therrien,et al.  Discrete Random Signals and Statistical Signal Processing , 1992 .

[42]  Heidi Christensen,et al.  Employing heterogeneous information in a multi-stream framework , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[43]  Say Wei Foo,et al.  Speaker recognition using adaptively boosted decision tree classifier , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44]  Christophe Ambroise,et al.  Semi-supervised MarginBoost , 2001, NIPS.

[45]  Richard M. Stern,et al.  Speech in Noisy Environments: robust automatic segmentation, feature extraction, and hypothesis combination , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[46]  Jean-Luc Gauvain,et al.  Unsupervised acoustic model training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47]  Harvey b. Fletcher,et al.  Speech and hearing in communication , 1953 .

[48]  Hervé Bourlard,et al.  Subband-based speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[49]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[50]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[51]  William J. Byrne,et al.  Minimum risk acoustic clustering for multilingual acoustic model combination , 2000, INTERSPEECH.

[52]  Richard M. Stern,et al.  LATTICE COMBINATION FOR IMPROVED SPEECH RECOGNITON , 2001 .

[53]  Alexander G. Hauptmann,et al.  Improving acoustic models with captioned multimedia speech , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[54]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[55]  Katrin Weber Multiple Timescale Feature Combination Towards Robust Speech Recognition , 2000, KONVENS.

[56]  David E. Reynolds,et al.  Automatic segmentation , 1986 .

[57]  Vaibhava Goel,et al.  Task adaptation of acoustic and language models based on large quantities of data , 2004, INTERSPEECH.

[58]  Jeff A. Bilmes,et al.  COMBINATION AND JOINT TRAINING OF ACOUSTIC CLASSIFIERS FOR SPEECH RECOGNITION , 2000 .

[59]  Sangita R. Sharma,et al.  Multi-stream approach to robust speech recognition , 1999 .

[60]  Hervé Bourlard,et al.  Adaptive ML-weighting in multi-band recombination of Gaussian mixture ASR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[61]  Alexander I. Rudnicky,et al.  Creating natural dialogs in the carnegie mellon communicator system , 1999, EUROSPEECH.

[62]  Gerard G. L. Meyer,et al.  Automatic selection of transcribed training material , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[63]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[64]  Daniel P. W. Ellis,et al.  Multi-stream speech recognition: ready for prime time? , 1999, EUROSPEECH.

[65]  Hervé Bourlard,et al.  Error correcting posterior combination for robust multi-band speech recognition , 2001, INTERSPEECH.

[66]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[67]  Vaibhava Goel,et al.  Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..

[68]  Vaibhava Goel,et al.  Segmental minimum Bayes-risk ASR voting strategies , 2000, INTERSPEECH.

[69]  Rong Zhang,et al.  Improving the performance of an LVCSR system through ensembles of acoustic models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[70]  Rong Zhang,et al.  A frame level boosting training scheme for acoustic modeling , 2004, INTERSPEECH.

[71]  C. Ris,et al.  Multi-band with contaminated training data , 2001 .

[72]  Gernot A. Fink,et al.  Conversational speech recognition using acoustic and articulatory input , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[73]  Hynek Hermansky,et al.  Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[74]  Hermann Ney,et al.  Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[75]  Wei Fan,et al.  Bagging , 2009, Encyclopedia of Machine Learning.

[76]  Steve R. Waterhouse,et al.  Ensemble methods for connectionist acoustic modelling , 1997, EUROSPEECH.

[77]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[78]  Samy Bengio,et al.  Boosting word error rates , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[79]  Steven Greenberg,et al.  Incorporating information from syllable-length time scales into automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[80]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[81]  Alexander I. Rudnicky,et al.  Creating Multi-Modal, User-Centric Records of Meetings with the Carnegie Mellon Meeting Recorder Architecture , 2004 .

[82]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[83]  Hermann Ney,et al.  Comparison of discriminative training criteria and optimization methods for speech recognition , 2001, Speech Commun..

[84]  M. L. Shire,et al.  Discriminant Training of Front-End and Acoustic Modeling Stages to Heterogeneous Acoustic Environmen , 2000 .

[85]  Chin-Hui Lee,et al.  A hybrid algorithm for speaker adaptation using MAP transformation and adaptation , 1997, IEEE Signal Processing Letters.

[86]  Hynek Hermansky,et al.  Towards subband-based speech recognition , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[87]  Samy Bengio,et al.  Boosting HMMs with an application to speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[88]  Rong Zhang,et al.  Investigations of issues for using multiple acoustic models to improve continuous speech recognition , 2006, INTERSPEECH.

[89]  Hermann Ney,et al.  Acoustic feature combination for robust speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[90]  Hermann Ney,et al.  Investigations on error minimizing training criteria for discriminative training in automatic speech recognition , 2005, INTERSPEECH.

[91]  Spyridon Matsoukas,et al.  Unsupervised Training on a Large Amount of Arabic Broadcast News Data , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[92]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[93]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[94]  Shivani Agarwal,et al.  An Experimental Study of EM-Based Algorithms for Semi-Supervised Learning in Audio Classification , 2003 .

[95]  G. Tur,et al.  Model adaptation for spoken language understanding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[96]  Marilyn A. Walker,et al.  Learning to personalize spoken generation for dialogue systems , 2005, INTERSPEECH.

[97]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[98]  L. Breiman Arcing Classifiers , 1998 .

[99]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[100]  M. A. Siegler,et al.  Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[101]  Carsten Meyer Utterance-level boosting of HMM speech recognizers , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[102]  Ralf Schlüter,et al.  Investigations on discriminative training criteria , 2000 .

[103]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[104]  Dilek Z. Hakkani-Tür,et al.  Using context to improve emotion detection in spoken dialog systems , 2005, INTERSPEECH.

[105]  Srinivas Bangalore,et al.  Combining prior knowledge and boosting for call classification in spoken language dialogue , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[106]  Bing Xiang,et al.  Light supervision in acoustic model training , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[107]  Ayhan Demiriz,et al.  Exploiting unlabeled data in ensemble methods , 2002, KDD.

[108]  S. Katagiri,et al.  Discriminative Learning for Minimum Error Classification , 2009 .

[109]  Geoffrey Zweig,et al.  Boosting Gaussian mixtures in an LVCSR system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[110]  Chin-Hui Lee,et al.  Combination of boosting and discriminative training for natural language call steering systems , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[111]  Dilek Z. Hakkani-Tür,et al.  Active and unsupervised learning for automatic speech recognition , 2003, INTERSPEECH.

[112]  Hervé Bourlard,et al.  Using multiple time scales in the framework of multi-stream speech recognition , 2000, INTERSPEECH.

[113]  Brian Kingsbury,et al.  Constructing ensembles of ASR systems using randomized decision trees , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[114]  Rong Zhang,et al.  Is this conversation on track? , 2001, INTERSPEECH.

[115]  Hervé Bourlard,et al.  Subband-Based Speech Recognition in Noisy Conditions: The Full Combination Approach , 1998 .

[116]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[117]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[118]  Robert P. W. Duin,et al.  An experimental study on diversity for bagging and boosting with linear classifiers , 2002, Inf. Fusion.

[119]  Anthony J. Robinson,et al.  Boosting the performance of connectionist large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[120]  Marilyn A. Walker,et al.  A trainable generator for recommendations in multimodal dialog , 2003, INTERSPEECH.

[121]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[122]  Jean-Luc Gauvain,et al.  Lightly Supervised Acoustic Model Training , 2000 .

[123]  Oh-Wook Kwon,et al.  Optimizing speech/non-speech classifier design using AdaBoost , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[124]  Anthony J. Robinson,et al.  Utterance clustering for large vocabulary continuous speech recognition , 1995, EUROSPEECH.

[125]  Ziyou Xiong,et al.  Boosting Speech/Non-speech Classification Using Averaged Mel-Frequency Cepstrum Coefficients Features , 2002, IEEE Pacific Rim Conference on Multimedia.

[126]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[127]  Alexander G. Hauptmann,et al.  Learning to Recognize Speech by Watching Television , 1999, IEEE Intell. Syst..

[128]  Sadaoki Furui,et al.  Stream-weight optimization by LDA and adaboost for multi-stream speaker verification , 2005, INTERSPEECH.

[129]  Rong Zhang,et al.  A New Data Selection Approach for Semi-Supervised Acoustic Modeling , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[130]  Hermann Ney,et al.  Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[131]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..