Machine Translation from Speech

This chapter describes approaches for translation from speech. Translation from speech presents two new issues. First, of course, we must recognize the speech in the source language. Although speech recognition has improved considerably over the last three decades, it is still far from being a solved problem. In the best of conditions, when the speech comes from high quality, carefully enunciated speech, on common topics (such as speech read by a trained news broadcaster), the word error rate is typically on the order of 5%. Humans can typically transcribe speech like this with less than 1% disagreement between annotators, so even this best number is still far worse than human performance. However, the task gets much harder when anything changes from this ideal condition. Some of the conditions that cause higher error rate are, if the topic is somewhat unusual, or the speakers are not reading so that their speech is more spontaneous, or if the speakers have an accent or are speaking a dialect, or if there is any acoustic degradation, such as noise or reverberation. In these cases, the word error can increase significantly to 20%, 30%, or higher. Accordingly, most of this chapter discusses techniques for improving speech recognition accuracy, while one section discusses techniques for integrating speech recognition with translation.

[1]  Andreas Stolcke,et al.  THE SRI MARCH 2000 HUB-5 CONVERSATIONAL SPEECH TRANSCRIPTION SYSTEM , 2000 .

[2]  Tanja Schultz,et al.  Correlated Bigram LSA for Unsupervised Language Model Adaptation , 2008, NIPS.

[3]  Hermann Ney,et al.  Advances in Arabic broadcast news transcription at RWTH , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[4]  Jont B. Allen,et al.  How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..

[5]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[6]  Wen Wang,et al.  Building A Highly Accurate Mandarin Speech Recognizer With Language-Independent Technologies and Language-Dependent Modules , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Jont B. Allen,et al.  Multiband product rule and consonant identification. , 2009, The Journal of the Acoustical Society of America.

[8]  N. Morgan,et al.  Pushing the envelope - aside [speech recognition] , 2005, IEEE Signal Processing Magazine.

[9]  Frédéric Bimbot,et al.  Variable-length sequence matching for phonetic transcription using joint multigrams , 1995, EUROSPEECH.

[10]  Puming Zhan,et al.  Speaker normalization based on frequency warping , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[12]  J. Xu,et al.  Audio Indexing of Arabic broadcast news , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[14]  Long Nguyen,et al.  Progress in the BBN 2007 Mandarin Speech to Text system , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Klaus Ries,et al.  The Karlsruhe-Verbmobil speech recognition engine , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Fabio Valente,et al.  Hierarchical and parallel processing of modulation spectrum for ASR applications , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Mark J. F. Gales,et al.  Discriminative map for acoustic model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[18]  T. Houtgast Frequency selectivity in amplitude-modulation detection. , 1989, The Journal of the Acoustical Society of America.

[19]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[20]  Mark J. F. Gales,et al.  Development of the CUHTK 2004 Mandarin conversational telephone speech transcription system , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[21]  Andreas Stolcke,et al.  Combining Discriminative Feature, Transform, and Model Training for Large Vocabulary Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22]  Shankar Kumar,et al.  A weighted finite state transducer translation template model for statistical machine translation , 2006, Nat. Lang. Eng..

[23]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[24]  Geoffrey Zweig,et al.  Advances in speech transcription at IBM under the DARPA EARS program , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Hynek Hermansky,et al.  Should recognizers have ears? , 1998, Speech Commun..

[26]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.

[27]  James R. Glass,et al.  Style & Topic Language Model Adaptation Using HMM-LDA , 2006, EMNLP.

[28]  Bing Xiang,et al.  Morphological Decomposition for Arabic Broadcast News Transcription , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[29]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[30]  D. Giuliani,et al.  Acoustic Model Adaptation with Multiple Supervisions , 2006 .

[31]  Jirí Navrátil,et al.  Recent advances in phonotactic language recognition using binary-decision trees , 2006, INTERSPEECH.

[32]  Daniel P. W. Ellis,et al.  Tandem acoustic modeling in large-vocabulary recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[33]  Yaser Al-Onaizan,et al.  Distortion Models for Statistical Machine Translation , 2006, ACL.

[34]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  N. Bertoldi,et al.  A new decoder for spoken language translation based on confusion networks , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[36]  Shankar Kumar,et al.  Local Phrase Reordering Models for Statistical Machine Translation , 2005, HLT.

[37]  Ruhi Sarikaya,et al.  Joint Morphological-Lexical Language Modeling for Machine Translation , 2007, NAACL.

[38]  Jean-Luc Gauvain,et al.  Modeling vowels for Arabic BN transcription , 2005, INTERSPEECH.

[39]  William J. Byrne,et al.  Statistical Phrase-Based Speech Translation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[40]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Andreas Stolcke,et al.  Morphology-based language modeling for arabic speech recognition , 2004, INTERSPEECH.

[42]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[43]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[44]  George Saon,et al.  Large margin semi-tied covariance transforms for discriminative training , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[45]  Hervé Bourlard,et al.  New entropy based combination rules in HMM/ANN multi-stream ASR , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[46]  Stanley F. Chen,et al.  Conditional and joint models for grapheme-to-phoneme conversion , 2003, INTERSPEECH.

[47]  Ahmad Emami,et al.  A Neural Syntactic Language Model , 2005, Machine Learning.

[48]  Ivica Rogina,et al.  Integrating dynamic speech modalities into context decision trees , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[49]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[50]  Yi Su,et al.  Large-scale random forest language models for speech recognition , 2007, INTERSPEECH.

[51]  S. Shamma,et al.  Spectro-temporal modulation transfer functions and speech intelligibility. , 1999, The Journal of the Acoustical Society of America.

[52]  Edward A. Lee,et al.  The Parallel Computing Laboratory at U.C. Berkeley: A Research Agenda Based on the Berkeley View , 2008 .

[53]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[54]  Tanja Schultz,et al.  Advances in the CMU/Interact Arabic GALE Transcription System , 2007, NAACL.

[55]  Hermann Ney,et al.  Multigram-based grapheme-to-phoneme conversion for LVCSR , 2003, INTERSPEECH.

[56]  Hermann Ney,et al.  Novel Reordering Approaches in Phrase-Based Statistical Machine Translation , 2005, ParallelText@ACL.

[57]  Andreas Stolcke,et al.  Recent innovations in speech-to-text transcription at SRI-ICSI-UW , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[58]  Johanna D. Moore,et al.  Proceedings of Interspeech 2008 , 2008 .

[59]  Hynek Hermansky,et al.  Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[60]  Andreas Stolcke,et al.  Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[61]  Hermann Ney,et al.  Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[62]  Brian Kingsbury,et al.  Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[63]  Dongxin Xu,et al.  The BBN Mandarin broadcast news transcription system , 2005, INTERSPEECH.

[64]  Andreas Stolcke,et al.  Improved discriminative training using phone lattices , 2005, INTERSPEECH.

[65]  Jean-Luc Gauvain,et al.  Investigating morphological decomposition for transcription of Arabic broadcast news and broadcast conversation data , 2008, INTERSPEECH.

[66]  Alex Pentland,et al.  Discriminative, generative and imitative learning , 2002 .

[67]  Steve J. Young,et al.  MMIE training of large vocabulary recognition systems , 1997, Speech Communication.

[68]  Georg Heigold,et al.  Modified MPE/MMI in a transducer-based framework , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[69]  Yiming Yang,et al.  Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization , 2003, ICML.

[70]  Geoffrey Zweig,et al.  fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[71]  Jean-Luc Gauvain,et al.  Arabic Broadcast News Transcription Using a One Million Word Vocalized Vocabulary , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[72]  Daniel Povey,et al.  Universal background model based speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[73]  Hermann Ney,et al.  The RWTH Arabic-to-English spoken language translation system , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[74]  Smaranda Muresan,et al.  Generalizing Word Lattice Translation , 2008, ACL.

[75]  Tanja Schultz,et al.  Speaker segmentation and clustering in meetings , 2004, INTERSPEECH.

[76]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[77]  Jan Cernocký,et al.  Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[78]  Geoffrey Zweig,et al.  Morpheme-Based Language Modeling for Arabic Lvcsr , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[79]  Geoffrey Zweig,et al.  An architecture for rapid decoding of large vocabulary conversational speech , 2003, INTERSPEECH.

[80]  Georg Heigold,et al.  On the equivalence of Gaussian and log-linear HMMs , 2008, INTERSPEECH.

[81]  Lori Lamel,et al.  Speaker-independent continuous speech dictation , 1993, Speech Communication.

[82]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.

[83]  Jinyu Li,et al.  A study on soft margin estimation for LVCSR , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[84]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[85]  Richard Zens,et al.  Efficient Speech Translation Through Confusion Network Decoding , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[86]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[87]  Hermann Ney,et al.  Acoustic feature combination for robust speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[88]  Hermann Ney,et al.  Frame based system combination and a comparison with weighted ROVER and CNC , 2006, INTERSPEECH.

[89]  A. Castaño,et al.  Using Categories in the EUTRANS System , 1997 .

[90]  George Saon,et al.  Penalty function maximization for large margin HMM training , 2008, INTERSPEECH.

[91]  Richard M. Schwartz,et al.  Discriminatively Trained Region Dependent Feature Transforms for Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[92]  Misha Pavel,et al.  Reconciliation of human and machine speech recognition performance , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[93]  Hermann Ney,et al.  Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[94]  Katrin Kirchhoff,et al.  Factored Neural Language Models , 2006, NAACL.

[95]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[96]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[97]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[98]  Lukás Burget,et al.  The 2005 AMI System for the Transcription of Speech in Meetings , 2005, MLMI.

[99]  Jean-Luc Gauvain,et al.  Transcribing broadcast data using MLP features , 2008, INTERSPEECH.

[100]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[101]  Tanja Schultz,et al.  Automatic disfluency removal on recognized spontaneous speech - rapid adaptation to speaker-dependent disfluencies , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[102]  H. Schwenk,et al.  Efficient training of large neural networks for language modeling , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[103]  J.R. Bellegarda,et al.  Latent semantic mapping: dimensionality reduction via globally optimal continuous parameter modeling , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[104]  Georg Heigold,et al.  Recent improvements of the RWTH GALE Mandarin LVCSR system , 2008, INTERSPEECH.

[105]  Yuya Akita,et al.  Language Model Adaptation based on PLSA on Topics and Speakers , 2003 .

[106]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[107]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[108]  Mark J. F. Gales,et al.  Context dependent language model adaptation , 2008, INTERSPEECH.

[109]  Jean-Luc Gauvain,et al.  Improved acoustic modeling for transcribing Arabic broadcast data , 2007, INTERSPEECH.

[110]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[111]  Tanja Schultz,et al.  Bilingual-LSA Based LM Adaptation for Spoken Language Translation , 2007, ACL.

[112]  T. Dau Modeling auditory processing of amplitude modulation , 1997 .

[113]  Hynek Hermansky,et al.  TRAPS - classifiers of temporal patterns , 1998, ICSLP.

[114]  Andreas Stolcke,et al.  An efficient repair procedure for quick transcriptions , 2004, INTERSPEECH.

[115]  Wu Chou,et al.  A unified approach of incorporating general features in decision tree based acoustic modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[116]  Daniel P. W. Ellis,et al.  LP-TRAP: linear predictive temporal patterns , 2004, INTERSPEECH.

[117]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[118]  A. Aertsen,et al.  Spectro-temporal receptive fields of auditory neurons in the grassfrog , 1980, Biological Cybernetics.

[119]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[120]  Bing Xiang,et al.  Light supervision in acoustic model training , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[121]  Geoffrey Zweig,et al.  The IBM Mandarin Broadcast Speech Transcription System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[122]  Frantisek Grézl,et al.  Optimizing bottle-neck features for lvcsr , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[123]  Philip C. Woodland,et al.  A PLSA-based language model for conversational telephone speech , 2004, INTERSPEECH.

[124]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[125]  M.-Y. Tsai,et al.  Pronunciation Modeling With Reduced Confusion for Mandarin Chinese Using a Three-Stage Framework , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[126]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[127]  Tanja Schultz,et al.  Sentence segmentation and punctuation recovery for spoken language translation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[128]  Holger Schwenk,et al.  Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation , 2008, INTERSPEECH.

[129]  Alexander H. Waibel,et al.  Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition , 1997, EUROSPEECH.

[130]  Hermann Ney,et al.  On the integration of speech recognition and statistical machine translation , 2005, INTERSPEECH.

[131]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[132]  Tanja Schultz,et al.  Bilingual LSA-based adaptation for statistical machine translation , 2007, Machine Translation.

[133]  Feifan Liu,et al.  Unsupervised language model adaptation via topic modeling based on named entity hypotheses , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[134]  Günther Ruske,et al.  Discriminative training for continuous speech recognition , 1995, EUROSPEECH.

[135]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[136]  Tanja Schultz,et al.  Unsupervised language model adaptation using latent semantic marginals , 2006, INTERSPEECH.

[137]  Hermann Ney,et al.  Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation , 2003, CL.

[138]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[139]  Mark J. F. Gales,et al.  The Cu-Htk Mandarin Broadcast News Transcription System , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[140]  Brian Roark,et al.  MAP adaptation of stochastic grammars , 2006, Comput. Speech Lang..

[141]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[142]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[143]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[144]  Ricky Ho Yin Chan,et al.  Improving broadcast news transcription by lightly supervised discriminative training , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[145]  Lin-Shan Lee,et al.  Improved Chinese broadcast news transcription by language modeling with temporally consistent training corpora and iterative phrase extraction , 2003, INTERSPEECH.

[146]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[147]  Jean-Luc Gauvain,et al.  Building continuous space language models for transcribing european languages , 2005, INTERSPEECH.

[148]  Jean-Luc Gauvain,et al.  Lightly supervised and unsupervised acoustic model training , 2002, Comput. Speech Lang..

[149]  Hynek Hermansky,et al.  Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.

[150]  Dimitra Vergyri,et al.  Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition , 2004 .

[151]  Andreas Stolcke,et al.  The ICSI-SRI Spring 2006 Meeting Recognition System , 2006, MLMI.

[152]  Michael Kleinschmidt,et al.  Localized spectro-temporal features for automatic speech recognition , 2003, INTERSPEECH.

[153]  Mei-Yuh Hwang,et al.  Improved tone modeling for Mandarin broadcast news speech recognition , 2006, INTERSPEECH.

[154]  Sherif Abdou,et al.  Recent progress in Arabic broadcast news transcription at BBN , 2005, INTERSPEECH.

[155]  Sanjeev Khudanpur,et al.  Language model adaptation for automatic speech recognition and statistical machine translation , 2005 .

[156]  Hung-An Chang,et al.  Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm , 2007, INTERSPEECH.

[157]  Richard M. Schwartz,et al.  Unsupervised Training on Large Amounts of Broadcast News Data , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[158]  Stephan Vogel,et al.  Recent Improvements in the CMU Large Scale Chinese-English SMT System , 2008, ACL.

[159]  Fabio Valente,et al.  Combination of Acoustic Classifiers Based on Dempster-Shafer Theory of Evidence , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[160]  Geoffrey Zweig,et al.  Anatomy of an extremely fast LVCSR decoder , 2005, INTERSPEECH.

[161]  Jean-Luc Gauvain,et al.  Training Neural Network Language Models on Very Large Corpora , 2005, HLT.

[162]  William H. Press,et al.  Numerical recipes , 1990 .

[163]  Gerard G. L. Meyer,et al.  Selective sampling of training data for speech recognition , 2002 .

[164]  Daniel Povey,et al.  Improvements to fMPE for discriminative training of features , 2005, INTERSPEECH.

[165]  William J. Byrne,et al.  HMM Word and Phrase Alignment for Statistical Machine Translation , 2005, HLT.

[166]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[167]  Noah A. Smith,et al.  Proceedings of EMNLP , 2007 .

[168]  Yi Liu,et al.  Search and classification based language model adaptation , 2008, INTERSPEECH.

[169]  Andreas Stolcke,et al.  Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..

[170]  Richard M. Schwartz,et al.  Efficient 2-pass n-best decoder , 1997, EUROSPEECH.

[171]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[172]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[173]  Hermann Ney,et al.  ASR Word Lattice Translation with Exhaustive Reordering is Possible , 2008 .

[174]  Taro Watanabe,et al.  A Unified Approach in Speech-to-Speech Translation: Integrating Features of Speech recognition and Machine Translation , 2004, COLING.

[175]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[176]  Fabio Valente,et al.  Hierarchical neural networks feature extraction for LVCSR system , 2007, INTERSPEECH.

[177]  Richard M. Schwartz,et al.  Progress in transcription of Broadcast News using Byblos , 2002, Speech Commun..

[178]  András Zolnay,et al.  Acoustic feature combination for speech recognition , 2006 .

[179]  George Saon,et al.  Data-driven approach to designing compound words for continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[180]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[181]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[182]  Nelson Morgan,et al.  Corrected tandem features for acoustic model training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[183]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[184]  A. Waibel,et al.  A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[185]  Hermann Ney,et al.  Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[186]  Mark J. F. Gales,et al.  Training and adapting MLP features for Arabic speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[187]  George Saon,et al.  Lattice-based Viterbi decoding techniques for speech translation , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[188]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[189]  Andreas Stolcke,et al.  Trapping conversational speech: extending TRAP/tandem approaches to conversational telephone speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[190]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[191]  Jean-Luc Gauvain,et al.  On the Use of MLP Features for Broadcast News Transcription , 2008, TSD.

[192]  Hermann Ney,et al.  Feature combination using linear discriminant analysis and its pitfalls , 2006, INTERSPEECH.

[193]  Tanja Schultz,et al.  The ISL RT04 Mandarin Broadcast News Evaluation System , 2004 .

[194]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[195]  Dietrich Klakow,et al.  Language model adaptation using dynamic marginals , 1997, EUROSPEECH.

[196]  Andreas Stolcke,et al.  Development of the SRI/nightingale Arabic ASR system , 2008, INTERSPEECH.

[197]  Sangita R. Sharma,et al.  Multi-stream approach to robust speech recognition , 1999 .

[198]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[199]  Mark J. F. Gales,et al.  Unsupervised Training for Mandarin Broadcast News and Conversation Transcription , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[200]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.

[201]  Georg Heigold,et al.  Modified MMI/MPE: a direct evaluation of the margin in speech recognition , 2008, ICML '08.

[202]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[203]  Geoffrey Zweig,et al.  The IBM 2006 Gale Arabic ASR System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[204]  Hagen Soltau,et al.  Phone dependent modeling of hyperarticulated effects# , 2000, INTERSPEECH.

[205]  J. E. Tree The Effects of False Starts and Repetitions on the Processing of Subsequent Words in Spontaneous Speech , 1995 .

[206]  Mark J. F. Gales,et al.  Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio , 2007, INTERSPEECH.

[207]  Marcus Tomalin,et al.  Discriminatively Trained Gaussian Mixture Models for Sentence Boundary Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[208]  Daniel Povey,et al.  Large scale MMIE training for conversational telephone speech recognition , 2000 .

[209]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[210]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[211]  David G. Stork,et al.  Pattern Classification , 1973 .

[212]  Ahmad Emami,et al.  Training Connectionist Models for the Structured Language Model , 2003, EMNLP.

[213]  Venkata Ramana Rao Gadde Modeling word durations , 2000, INTERSPEECH.

[214]  Jen-Tzung Chien,et al.  Latent dirichlet language model for speech recognition , 2008, 2008 IEEE Spoken Language Technology Workshop.

[215]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[216]  D. D. Greenwood A cochlear frequency-position function for several species--29 years later. , 1990, The Journal of the Acoustical Society of America.

[217]  Misha Pavel,et al.  Towards ASR on partially corrupted speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[218]  Lee M. Miller,et al.  Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. , 2002, Journal of neurophysiology.

[219]  Yaser Al-Onaizan,et al.  Arabic ASR and MT Integration for GALE , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[220]  Tanja Schultz,et al.  Correlated Latent Semantic Model for Unsupervised LM Adaptation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[221]  James Demmel,et al.  Using PHiPAC to speed error back-propagation learning , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[222]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[223]  Misha Pavel,et al.  On the relative importance of various components of the modulation spectrum for automatic speech recognition , 1999, Speech Commun..

[224]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[225]  Frédéric Bimbot,et al.  Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[226]  Mark J. F. Gales,et al.  Speech Recognition System Combination for Machine Translation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[227]  David Gelbart,et al.  Ensemble Feature Selection for Multi-Stream Automatic Speech Recognition , 2008 .

[228]  Hermann Ney,et al.  Speech translation: coupling of recognition and translation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[229]  Jianfeng Gao,et al.  Extraction of Chinese Compound Words - An Experimental Study on a Very Large Corpus , 2000, ACL 2000.

[230]  Alexander H. Waibel,et al.  Unsupervised training of a speech recognizer using TV broadcasts , 1998, ICSLP.

[231]  Bowen Zhou,et al.  On Efficient Coupling of ASR and SMT for Speech Translation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[232]  Hermann Ney,et al.  Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.

[233]  Tanja Schultz,et al.  Dynamic language model adaptation using variational Bayes inference , 2005, INTERSPEECH.

[234]  Rabih Zbib,et al.  Improved morphological decomposition for Arabic broadcast news transcription , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[235]  Lawrence K. Saul,et al.  Comparison of Large Margin Training to Other Discriminative Methods for Phonetic Recognition by Hidden Markov Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[236]  Andreas Stolcke,et al.  Prosody Modeling for Automatic Speech Recognition and Understanding , 2004 .

[237]  Ahmad Emami,et al.  Using a connectionist model in a syntactical based language model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[238]  Hervé Bourlard,et al.  Continuous speech recognition , 1995, IEEE Signal Process. Mag..

[239]  Kai Feng,et al.  SUBSPACE GAUSSIAN MIXTURE MODELS FOR SPEECH RECOGNITION , 2009 .

[240]  Jean-Luc Gauvain,et al.  Dynamic language modeling for broadcast news , 2004, INTERSPEECH.

[241]  Jont B. Allen,et al.  Articulation and Intelligibility , 2005, Synthesis Lectures on Speech and Audio Processing.

[242]  Andreas Stolcke,et al.  Using MLP features in SRI's conversational speech recognition system , 2005, INTERSPEECH.