Statistical Machine Translation for Speech : A Perspective on Structures , Learning & Decoding

— We survey and analyze state-of-the-art statistical machine translation (SMT) techniques for speech translation (ST). We review key learning problems, and investigate essential model structures in SMT, taking a unified perspective to reveal both the connections and contrasts between automatic speech recognition (ASR) and SMT. We show that phrase-based SMT can be viewed as a sequence of finite-state transducer (FST) operations, similar in spirit to ASR. We further inspect the synchronous context-free grammar (SCFG) based formalism that includes hierarchical phrase-based and many linguistically syntax-based models. Decoding for ASR, FST-based and SCFG-based translation are also presented from a unified perspective as different realizations of the generic Viterbi algorithm on graphs or hypergraphs. These consolidated perspectives are helpful to catalyze tighter integrations for improved ST, and we discuss joint decoding and modeling towards coupling ASR and SMT.

[1]  Haizhou Li,et al.  A Tree Sequence Alignment-based Tree-to-Tree Translation Model , 2008, ACL.

[2]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[3]  Phil Blunsom,et al.  A Discriminative Latent Variable Model for Statistical Machine Translation , 2008, ACL.

[4]  Daniel Marcu,et al.  What Can Syntax-Based MT Learn from Phrase-Based MT? , 2007, EMNLP.

[5]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Philip Resnik,et al.  Soft Syntactic Constraints for Hierarchical Phrased-Based Translation , 2008, ACL.

[7]  Stuart M. Shieber,et al.  Principles and Implementation of Deductive Parsing , 1994, J. Log. Program..

[8]  David Chiang,et al.  Learning to Translate with Source and Target Syntax , 2010, ACL.

[9]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[10]  Bowen Zhou,et al.  Prior Derivation Models For Formally Syntax-Based Translation Using Linguistically Syntactic Parsing and Tree Kernels , 2008, SSST@ACL.

[11]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[12]  Michael Collins,et al.  A Discriminative Model for Tree-to-Tree Translation , 2006, EMNLP.

[13]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[14]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[15]  E. R. Banga,et al.  Hierarchical Phrase-Based Translation with Weighted Finite State Transducers , 2009, NAACL.

[16]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[17]  Giorgio Gallo,et al.  Directed Hypergraphs and Applications , 1993, Discret. Appl. Math..

[18]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[19]  Li Deng,et al.  A novel decision function and the associated decision-feedback learning for speech translation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[21]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[22]  Li Deng,et al.  Maximum Expected BLEU Training of Phrase and Lexicon Translation Models , 2012, ACL.

[23]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[24]  Richard Edwin Stearns,et al.  Syntax-Directed Transduction , 1966, JACM.

[25]  Richard M. Schwartz,et al.  Expected BLEU Training for Graphs: BBN System Description for WMT11 System Combination Task , 2011, WMT@EMNLP.

[26]  James Glass,et al.  Research Developments and Directions in Speech Recognition and Understanding, Part 1 , 2009 .

[27]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[28]  Mehryar Mohri,et al.  Semiring Frameworks and Algorithms for Shortest-Distance Problems , 2002, J. Autom. Lang. Comb..

[29]  William J. Byrne,et al.  Hierarchical Phrase-based Translation Representations , 2011, EMNLP.

[30]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[31]  Hermann Ney,et al.  Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[32]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[33]  Peng Xu,et al.  A Systematic Comparison of Phrase Table Pruning Techniques , 2012, EMNLP.

[34]  Taro Watanabe,et al.  A Unified Approach in Speech-to-Speech Translation: Integrating Features of Speech recognition and Machine Translation , 2004, COLING.

[35]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[36]  Bowen Zhou,et al.  FOLSOM: A FAST AND MEMORY-EFFICIENT PHRASE-BASED APPROACH TO STATISTICAL MACHINE TRANSLATION , 2006, 2006 IEEE Spoken Language Technology Workshop.

[37]  Liang Huang,et al.  Statistical Syntax-Directed Translation with Extended Domain of Locality , 2006, AMTA.

[38]  Richard Zens,et al.  Efficient Speech Translation Through Confusion Network Decoding , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[40]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[41]  Hermann Ney,et al.  Speech translation: coupling of recognition and translation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[42]  Bing Zhao,et al.  A Simplex Armijo Downhill Algorithm for Optimizing Statistical Machine Translation Decoding Parameters , 2009, NAACL.

[43]  Bowen Zhou,et al.  Lexicalized reordering in multiple-graph based statistical machine translation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Tong Zhang,et al.  A Discriminative Global Training Algorithm for Statistical MT , 2006, ACL.

[45]  Libin Shen Understanding Exhaustive Pattern Learning , 2011, ArXiv.

[46]  Christopher D. Manning,et al.  Accurate Non-Hierarchical Phrase-Based Translation , 2010, NAACL.

[47]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[48]  Liang Huang,et al.  Advanced Dynamic Programming in Semiring and Hypergraph Frameworks , 2008, COLING.

[49]  Smaranda Muresan,et al.  Generalizing Word Lattice Translation , 2008, ACL.

[50]  Srinivas Bangalore,et al.  Stochastic Finite-State Models for Spoken Language Machine Translation , 2000, Machine Translation.

[51]  Bowen Zhou,et al.  Two Methods for Extending Hierarchical Rules from the Bilingual Chart Parsing , 2010, COLING.

[52]  Bowen Zhou,et al.  An EM algorithm for SCFG in formal syntax-based translation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[53]  Kenneth Ward Church,et al.  Introduction to the Special Issue on Computational Linguistics Using Large Corpora , 1993, Comput. Linguistics.

[54]  Franz Josef Och,et al.  A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT , 2008, COLING.

[55]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[56]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[57]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[58]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[59]  Hermann Ney,et al.  Integrating Speech Recognition and Machine Translation: Where do We Stand? , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[60]  Bowen Zhou,et al.  On Efficient Coupling of ASR and SMT for Speech Translation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[61]  Markus Freitag,et al.  Review of Hypothesis Alignment Algorithms for MT System Combination via Confusion Network Decoding , 2012, WMT@NAACL-HLT.

[62]  Wei Zhang,et al.  The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks , 2013, Comput. Speech Lang..

[63]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[64]  Alon Lavie,et al.  An interlingua based on domain actions for machine translation of task-oriented dialogues , 1998, ICSLP.

[65]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[66]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[67]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[68]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[69]  Li Deng,et al.  Speech-Centric Information Processing: An Optimization-Oriented Approach , 2013, Proceedings of the IEEE.

[70]  Salim Roukos,et al.  Direct Translation Model 2 , 2007, HLT-NAACL.

[71]  Michael Picheny,et al.  Statistical natural language generation for speech-to-speech machine translation systems , 2002, INTERSPEECH.

[72]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[73]  Taro Watanabe,et al.  Reordering Constraints for Phrase-Based Statistical Machine Translation , 2004, COLING.

[74]  Brian A. Weiss,et al.  Evaluating speech translation systems: applying SCORE to TRANSTAC technologies , 2009, PerMIS.

[75]  Bowen Zhou,et al.  Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions , 2010, EMNLP.

[76]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[77]  Daniel Gildea,et al.  Stochastic Lexicalized Inversion Transduction Grammar for Alignment , 2005, ACL.

[78]  I. Dan Melamed,et al.  Empirical Lower Bounds on the Complexity of Translational Equivalence , 2006, ACL.

[79]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[80]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[81]  F. Casacuberta,et al.  Recent efforts in spoken language translation , 2008, IEEE Signal Processing Magazine.

[82]  Kevin Knight,et al.  Synchronous Tree Adjoining Machine Translation , 2009, EMNLP.

[83]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[84]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[85]  William J. Byrne,et al.  Statistical Phrase-Based Speech Translation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[86]  S. H. A N K A R K U M A R,et al.  A weighted finite state transducer translation template model for statistical machine translation , 2005, Natural Language Engineering.

[87]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[88]  Haitao Mi,et al.  Efficient Incremental Decoding for Tree-to-String Translation , 2010, EMNLP.

[89]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[90]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[91]  William J. Byrne,et al.  Phrasal Segmentation Models for Statistical Machine Translation , 2008, COLING.

[92]  Douglas D. O'Shaughnessy,et al.  Speech Processing , 2018 .

[93]  Nigel G. Ward Machine Translation: Past, Present, Future , 2001 .