State of the art in statistical methods for language and speech processing

HighlightsThe purpose of this contribution is to review the state of the art in both areas, statistical methods and speech processing.Point out the top trends in statistical modelling across a wide range of problems and identify their most salient characteristics.The paper concludes with some prognostications regarding the likely impact on the field going forward. Recent years have seen rapid growth in the deployment of statistical methods for computational language and speech processing. The current popularity of such methods can be traced to the convergence of several factors, including the increasing amount of data now accessible, sustained advances in computing power and storage capabilities, and ongoing improvements in machine learning algorithms. The purpose of this contribution is to review the state of the art in both areas, point out the top trends in statistical modelling across a wide range of problems, and identify their most salient characteristics. The paper concludes with some prognostications regarding the likely impact on the field going forward.

[1]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  Matteo Negri,et al.  Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora , 2011, EMNLP.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Jerome R. Bellegarda,et al.  Latent Semantic Mapping: Principles & Applications , 2006, Latent Semantic Mapping.

[6]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7]  Simone Paolo Ponzetto,et al.  Knowledge Derived From Wikipedia For Computing Semantic Relatedness , 2007, J. Artif. Intell. Res..

[8]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[10]  Bert Cranen,et al.  Sparse imputation for large vocabulary noise robust ASR , 2011, Comput. Speech Lang..

[11]  Wang Ling,et al.  Microblogs as Parallel Corpora , 2013, ACL.

[12]  Jan Cernocký,et al.  Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[15]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[16]  George Saon,et al.  Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[17]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[18]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Mark Dredze,et al.  Non-Expert Correction of Automatically Generated Relation Annotations , 2010, Mturk@HLT-NAACL.

[20]  Jerome R. Bellegarda,et al.  Latent perceptual mapping with data-driven variable-length acoustic units for template-based speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Themos Stafylakis,et al.  I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Arul Menezes,et al.  Social Text Normalization using Contextual Graph Random Walks , 2013, ACL.

[23]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[24]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[25]  Dong Yu,et al.  The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[27]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[28]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[29]  Patrick Wambacq,et al.  Template-Based Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[31]  Jisup Hong,et al.  How Good is the Crowd at "real" WSD? , 2011, Linguistic Annotation Workshop.

[32]  J. Licklider,et al.  A duplex theory of pitch perception , 1951, Experientia.

[33]  Kai-Wei Chang,et al.  Multi-Relational Latent Semantic Analysis , 2013, EMNLP.

[34]  Biing-Hwang Juang,et al.  An Overview of Automatic Speech Recognition , 1996 .

[35]  Yi Yang,et al.  A Log-Linear Model for Unsupervised Text Normalization , 2013, EMNLP.

[36]  Georg Heigold,et al.  Speech recognition with state-based nearest neighbour classifiers , 2007, INTERSPEECH.

[37]  Georg Heigold,et al.  Asynchronous stochastic optimization for sequence training of deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[39]  Matteo Negri,et al.  Creating a Bi-lingual Entailment Corpus through Translations with Mechanical Turk: $100 for a 10-day Rush , 2010, Mturk@HLT-NAACL.

[40]  Gennaro Chierchia,et al.  Anaphora and dynamic binding , 1992 .

[41]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[42]  Stefan Riezler,et al.  Twitter Translation using Translation-Based Cross-Lingual Retrieval , 2012, WMT@NAACL-HLT.

[43]  Keith Johnson,et al.  Phonetic Feature Encoding in Human Superior Temporal Gyrus , 2014, Science.

[44]  Tara N. Sainath,et al.  Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization , 2012, INTERSPEECH.

[45]  Michael Strube,et al.  Transforming Wikipedia into a large scale multilingual concept network , 2013, Artif. Intell..

[46]  Roger K. Moore Computer Speech and Language , 1986 .

[47]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[48]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.

[49]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[50]  Yuji Matsumoto,et al.  Learning Character Representations for Chinese Word Segmentation , 2014 .

[51]  Heng Ji,et al.  Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media , 2013, ACL.

[52]  Tomek Strzalkowski,et al.  From Discourse to Logic , 1991 .

[53]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[54]  Stanley F. Chen,et al.  Shrinking Exponential Language Models , 2009, NAACL.

[55]  Jason D. Williams,et al.  Crowd-sourcing for difficult transcription of speech , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[56]  Dong Yu,et al.  Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[57]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[58]  Andreas Stolcke,et al.  Tandem Connectionist Feature Extraction for Conversational Speech Recognition , 2004, MLMI.

[59]  Jinyu Li,et al.  Hermitian based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models , 2012, INTERSPEECH.

[60]  Dong Yu,et al.  1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.

[61]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[62]  Roger Levy,et al.  Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[63]  Noah A. Smith,et al.  Predicting the NFL using Twitter , 2013, MLSA@PKDD/ECML.

[64]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[65]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[66]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[67]  Kaisheng Yao,et al.  KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[68]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[69]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[70]  Benjamin Van Durme,et al.  Using Conceptual Class Attributes to Characterize Social Media Users , 2013, ACL.

[71]  Tara N. Sainath,et al.  Exemplar-Based Processing for Speech Recognition: An Overview , 2012, IEEE Signal Processing Magazine.

[72]  Tara N. Sainath,et al.  Sparse representation features for speech recognition , 2010, INTERSPEECH.

[73]  Wouter Weerkamp,et al.  Microblog language identification: overcoming the limitations of short, unedited and idiomatic text , 2012, Language Resources and Evaluation.

[74]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[75]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[76]  Jianfeng Gao,et al.  Statistical query translation models for cross-language information retrieval , 2006, TALIP.

[77]  Mark Dredze,et al.  Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[78]  Jerome R. Bellegarda,et al.  Latent perceptual mapping: a new acoustic modeling framework for speech recognition , 2010, INTERSPEECH.

[79]  Wang Ling,et al.  Paraphrasing 4 Microblog Normalization , 2013, EMNLP.

[80]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[81]  Yi Liu,et al.  Query Rewriting Using Monolingual Statistical Machine Translation , 2010, CL.

[82]  Christof Monz,et al.  Automatic Single-Document Key Fact Extraction from Newswire Articles , 2009, EACL.

[83]  Mihai Surdeanu Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling and Temporal Slot Filling , 2013, TAC.

[84]  Trevor Cohn,et al.  A user-centric model of voting intention from Social Media , 2013, ACL.

[85]  Geoffrey Zweig,et al.  Polarity Inducing Latent Semantic Analysis , 2012, EMNLP.

[86]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[87]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[88]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[89]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[90]  Noah A. Smith,et al.  Rating Computer-Generated Questions with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[91]  I. I. N. Kamp Combining Montague Semantics and Discourse Representation , 1996 .

[92]  Sara Stymne,et al.  Spell Checking Techniques for Replacement of Unknown Words and Data Cleaning for Haitian Creole SMS Translation , 2011, WMT@EMNLP.

[93]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[94]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[95]  Iryna Gurevych,et al.  Automatically Classifying Edit Categories in Wikipedia Revisions , 2013, EMNLP.

[96]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[97]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[98]  Marcello Federico,et al.  Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.

[99]  Kaisheng Yao,et al.  Adaptation of context-dependent deep neural networks for automatic speech recognition , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[100]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[101]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[102]  Daniel P. W. Ellis,et al.  Tandem acoustic modeling in large-vocabulary recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[103]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[104]  Ahmad Emami,et al.  A Neural Syntactic Language Model , 2005, Machine Learning.

[105]  Lukás Burget,et al.  Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[106]  Tara N. Sainath,et al.  A convex hull approach to sparse representations for exemplar-based speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[107]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[108]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[109]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[110]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[111]  Hermann Ney,et al.  Acoustic modeling with deep neural networks using raw time signal for LVCSR , 2014, INTERSPEECH.

[112]  Christof Monz,et al.  User Edits Classification Using Document Revision Histories , 2012, EACL.

[113]  Alexander I. Rudnicky,et al.  Using the Amazon Mechanical Turk to Transcribe and Annotate Meeting Speech for Extractive Summarization , 2010, Mturk@HLT-NAACL.

[114]  Richard M. Schwartz,et al.  A Sentence-Trimming Approach to Multi-Document Summarization , 2005 .

[115]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[116]  Bhuvana Ramabhadran,et al.  Exemplar-based processing for speech recognition , 2012 .

[117]  Geoffrey Zweig,et al.  Integrating meta-information into exemplar-based speech recognition with segmental conditional random fields , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[118]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[119]  M Marcus New trends in natural language processing: statistical natural language processing. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[120]  Geoffrey Zweig,et al.  SCARF: a segmental conditional random field toolkit for speech recognition , 2010, INTERSPEECH.

[121]  László Dezsö,et al.  Universal Grammar , 1981, Certainty in Action.

[122]  Hideki Kashioka,et al.  Factored recurrent neural network language model in TED lecture transcription , 2012, IWSLT.

[123]  Hugo Van hamme,et al.  Progress in example based automatic speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[124]  Hui Jiang,et al.  Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[125]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[126]  Shrikanth S. Narayanan,et al.  Spatio-temporal articulatory movement primitives during speech production: extraction, interpretation, and validation. , 2013, The Journal of the Acoustical Society of America.

[127]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[128]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[129]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[130]  R. Bellegarda,et al.  Latent Semantic Mapping [ A data-driven framework for modeling global relationships implicit in large volumes of data ] , 2000 .

[131]  Vladimir Eidelman,et al.  Noisy SMS Machine Translation in Low-Density Languages , 2011, WMT@EMNLP.

[132]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[133]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[134]  Chris Callison-Burch,et al.  Using Mechanical Turk to Build Machine Translation Evaluation Sets , 2010, Mturk@HLT-NAACL.

[135]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[136]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[137]  Dong Yu,et al.  On parallelizability of stochastic gradient descent for speech DNNS , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[138]  Paul Dekker,et al.  Coreference and Representationalism , 2000 .

[139]  Meliha Yetisgen-Yildiz,et al.  Annotating Large Email Datasets for Named Entity Recognition with Mechanical Turk , 2010, Mturk@HLT-NAACL.