Optimization Algorithms and Applications for Speech and Language Processing

Optimization techniques have been used for many years in the formulation and solution of computational problems arising in speech and language processing. Such techniques are found in the Baum-Welch, extended Baum-Welch (EBW), Rprop, and GIS algorithms, for example. Additionally, the use of regularization terms has been seen in other applications of sparse optimization. This paper outlines a range of problems in which optimization formulations and algorithms play a role, giving some additional details on certain application problems in machine translation, speaker/language recognition, and automatic speech recognition. Several approaches developed in the speech and language processing communities are described in a way that makes them more recognizable as optimization procedures. Our survey is not exhaustive and is complemented by other papers in this volume.

[1]  Yehoshua Bar-Hillel,et al.  The Present Status of Automatic Translation of Languages , 1960, Adv. Comput..

[2]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[3]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  Dimitri Kanevsky,et al.  An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[8]  Salvatore D. Morgera,et al.  An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[10]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[11]  Renato De Mori,et al.  High-performance connected digit recognition using maximum mutual information estimation , 1994, IEEE Trans. Speech Audio Process..

[12]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[13]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[14]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[16]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[17]  EntropyModelsStanley,et al.  A Gaussian Prior for Smoothing Maximum , 1999 .

[18]  Qiang Huo,et al.  On adaptive decision rules and decision parameter adaptation for automatic speech recognition , 2000, Proceedings of the IEEE.

[19]  Alex Pentland,et al.  On Reversing Jensen's Inequality , 2000, NIPS.

[20]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[21]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[22]  Hermann Ney,et al.  Comparison of discriminative training criteria and optimization methods for speech recognition , 2001, Speech Commun..

[23]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[24]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[26]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[27]  Hermann Ney,et al.  A comparative study on maximum entropy and discriminative training for acoustic modeling in automatic speech recognition , 2003, INTERSPEECH.

[28]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[29]  Alvin F. Martin,et al.  NIST 2003 language recognition evaluation , 2003, INTERSPEECH.

[30]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[31]  Thomas P. Minka,et al.  Algorithms for maximum-likelihood logistic regression , 2003 .

[32]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[33]  Dimitri Kanevsky Extended Baum transformations for general functions , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  D. Kanevsky Extended Baum Transformations for General Functions , II , 2005 .

[35]  Mohamed Afify Extended baum-welch reestimation of Gaussian mixture models based on reverse Jensen inequality , 2005, INTERSPEECH.

[36]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[37]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[38]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[39]  N. Brummer,et al.  On calibration of language recognition scores , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[40]  Alvin F. Martin,et al.  The Current State of Language Recognition: NIST 2005 Evaluation Results , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[41]  Scott Axelrod,et al.  Discriminative Estimation of Subspace Constrained Gaussian Mixture Models for Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[44]  John S. Garofolo,et al.  NIST Speech Processing Evaluations: LVCSR, Speaker Recognition, Language Recognition , 2007 .

[45]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  Frank K. Soong,et al.  A Constrained Line Search Optimization for Discriminative Training in Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[47]  Wiebe van der Hoek,et al.  SOFSEM 2007: Theory and Practice of Computer Science , 2007 .

[48]  David A. van Leeuwen,et al.  An Introduction to Application-Independent Evaluation of Speaker Recognition Systems , 2007, Speaker Classification.

[49]  Haizhou Li,et al.  Vector-Based Spoken Language Classification , 2008 .

[50]  Bin Ma,et al.  Optimizing the Performance of Spoken Language Recognition With Discriminative Training , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  Zhi-Quan Luo,et al.  A convex optimization method for joint mean and variance parameter estimation of large-margin CDHMM , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[52]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[53]  Georg Heigold,et al.  A GIS-like training algorithm for log-linear models with hidden variables , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54]  Wu Chou,et al.  Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[55]  Tara N. Sainath,et al.  Generalization of extended baum-welch parameter estimation for discriminative training and decoding , 2008, INTERSPEECH.

[56]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[57]  Niko Brümmer,et al.  Measuring, refining and calibrating speaker and language information extracted from speech , 2010 .

[58]  Tara N. Sainath,et al.  Sparse representation features for speech recognition , 2010, INTERSPEECH.

[59]  Haizhou Li,et al.  TechWare: Speaker and Spoken Language Recognition Resources , 2010 .

[60]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[61]  Thomas Hain,et al.  Error Approximation and Minimum Phone Error Acoustic Model Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[62]  Li Deng,et al.  A Geometric Perspective of Large-Margin Training of Gaussian Models [Lecture Notes] , 2010, IEEE Signal Processing Magazine.

[63]  Bin Ma,et al.  TechWare: Speaker and Spoken Language Recognition Resources [Best of the Web] , 2010, IEEE Signal Processing Magazine.

[64]  Kemal Oflazer,et al.  Exploiting Morphology and Local Word Reordering in English-to-Turkish Phrase-Based Statistical Machine Translation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[65]  Georg Heigold,et al.  A log-linear discriminative modeling framework for speech recognition , 2010 .

[66]  Haizhou Li,et al.  A Maximum-Entropy Segmentation Model for Statistical Machine Translation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[67]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[68]  T. Kinnunen,et al.  Using Discrete Probabilities With Bhattacharyya Measure for SVM-Based Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[69]  Georg Heigold,et al.  EM-style optimization of hidden conditional random fields for grapheme-to-phoneme conversion , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[70]  Yu Hu,et al.  Trust Region-Based Optimization for Maximum Mutual Information Estimation of HMMs in Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[71]  Tanja Schultz,et al.  Generalized Baum-Welch Algorithm and its Implication to a New Extended Baum-Welch Algorithm , 2011, INTERSPEECH.

[72]  Dong Yu,et al.  Deep Convex Net: A Scalable Architecture for Speech Pattern Classification , 2011, INTERSPEECH.

[73]  Li Deng,et al.  Speech Recognition, Machine Translation, and Speech Translation—A Unified Discriminative Learning Paradigm [Lecture Notes] , 2011, IEEE Signal Processing Magazine.

[74]  Lirong Dai,et al.  Trust Region-Based Optimization for Maximum Mutual Information Estimation of HMMs in Speech Recognition , 2011 .

[75]  Tara N. Sainath,et al.  A-Functions: A generalization of Extended Baum-Welch transformations to convex optimization , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[76]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[77]  Georg Heigold,et al.  Equivalence of Generative and Log-Linear Models , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[78]  Li Deng,et al.  Maximum Expected BLEU Training of Phrase and Lexicon Translation Models , 2012, ACL.

[79]  Shinji Watanabe,et al.  Bayesian approaches to acoustic modeling: a review , 2012, APSIPA Transactions on Signal and Information Processing.

[80]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[81]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[82]  Dong Yu,et al.  Scalable stacking and learning for building deep architectures , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[83]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[84]  Nelson Morgan,et al.  Deep and Wide: Multiple Layers in Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[85]  Jen-Tzung Chien,et al.  Bayesian Sensing Hidden Markov Models , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[86]  Tara N. Sainath,et al.  Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization , 2012, INTERSPEECH.

[87]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[88]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[89]  Bin Ma,et al.  Sparse Classifier Fusion for Speaker Verification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[90]  Dong Yu,et al.  Tensor Deep Stacking Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  Tara N. Sainath,et al.  Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[92]  Chin-Hui Lee,et al.  Exploiting deep neural networks for detection-based speech recognition , 2013, Neurocomputing.

[93]  Li Deng,et al.  Speech-Centric Information Processing: An Optimization-Oriented Approach , 2013, Proceedings of the IEEE.

[94]  Xiao Li,et al.  Machine Learning Paradigms for Speech Recognition: An Overview , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[95]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[96]  Bin Ma,et al.  Spoken Language Recognition: From Fundamentals to Practice , 2013, Proceedings of the IEEE.