Coarse-to-Fine Natural Language Processing

State-of-the-art natural language processing models are anything but compact. Syntactic parsers have huge grammars, machine translation systems have huge transfer tables, and so on across a range of tasks. With such complexity come two challenges. First, how can we learn highly complex models? Second, how can we efficiently infer optimal structures within them? Hierarchical coarse-to-fine methods address both questions. Coarse-to-fine approaches exploit a sequence of models which introduce complexity gradually. At the top of the sequence is a trivial model in which learning and inference are both cheap. Each subsequent model refines the previous one, until a final, full-complexity model is reached. Because each refinement introduces only limited complexity, both learning and inference can be done in an incremental fashion. In this dissertation, we describe several coarse-to-fine systems. In the domain of syntactic parsing, complexity is in the grammar. We present a latent variable approach which begins with an X-bar grammar and learns to iteratively refine grammar categories. For example, noun phrases might be split into subcategories for subjects and objects, singular and plural, and so on. This splitting process admits an efficient incremental inference scheme which reduces parsing times by orders of magnitude. Furthermore, it produces the best parsing accuracies across an array of languages, in a fully language-general fashion. In the domain of acoustic modeling for speech recognition, complexity is needed to model the rich phonetic properties of natural languages. Starting from a mono-phone model, we learn increasingly refined models that capture phone internal structures, as well as context-dependent variations in an automatic way. Our approaches reduces error rates compared to other baseline approaches, while streamlining the learning procedure. In the domain of machine translation, complexity arises because there and too many target language word types. To manage this complexity, we translate into target language clusterings of increasing vocabulary size. This approach gives dramatic speed-ups while additionally increasing final translation quality.

[1]  Detlef Prescher,et al.  Inducing Head-Driven PCFGs with Latent Heads: Refining a Tree-Bank Grammar for Parsing , 2005, ECML.

[2]  Michael Collins,et al.  Hidden-Variable Models for Discriminative Reranking , 2005, HLT.

[3]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[4]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[5]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[6]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[7]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[8]  Francis Jack Smith,et al.  Improved phone recognition using Bayesian triphone models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10]  Ivan Titov,et al.  Loss Minimization in Parse Reranking , 2006, EMNLP.

[11]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[12]  Daniel Gildea,et al.  Efficient Multi-Pass Decoding for Synchronous Context Free Grammars , 2008, ACL.

[13]  Noah A. Smith,et al.  Weighted and Probabilistic Context-Free Grammars Are Equally Expressive , 2007, CL.

[14]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[15]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[16]  James R. Glass,et al.  Heterogeneous measurements and multiple classifiers for speech recognition , 1998, ICSLP.

[17]  Jun'ichi Tsujii,et al.  Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[18]  Dekai Wu,et al.  A Polynomial-Time Algorithm for Statistical Machine Translation , 1996, ACL.

[19]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[20]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[21]  Eugene Charniak,et al.  Self-Training for Biomedical Parsing , 2008, ACL.

[22]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[23]  Dan Klein,et al.  Coarse-to-Fine Syntactic Machine Translation using Language Projections , 2008, EMNLP.

[24]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[25]  Hermann Ney,et al.  A Comparative Study on Reordering Constraints in Statistical Machine Translation , 2003, ACL.

[26]  Aravind K. Joshi,et al.  Some Computational Properties of Tree Adjoining Grammars , 1985, ACL.

[27]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[28]  Roger Levy,et al.  Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[29]  Daniel Jurafsky,et al.  Hidden Conditional Random Fields for phone recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[30]  Dan Klein,et al.  Discriminative Log-Linear Grammars with Latent Variables , 2007, NIPS.

[31]  Hugo Van hamme,et al.  An adaptive-beam pruning technique for continuous speech recognition , 1996, ICSLP.

[32]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[33]  Giorgio Satta,et al.  Cross-Entropy and Estimation of Probabilistic Context-Free Grammars , 2006, NAACL.

[34]  Joshua Goodman,et al.  Parsing Algorithms and Metrics , 1996, ACL.

[35]  Zhiyi Chi,et al.  Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.

[36]  Lawrence K. Saul,et al.  Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[37]  Haiping Lu,et al.  Coarse-to-Fine Pedestrian Localization and Silhouette Extraction for the Gait Challenge Data Sets , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[38]  Eugene Charniak,et al.  Edge-Based Best-First Chart Parsing , 1998, VLC@COLING/ACL.

[39]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[40]  Markus Dreyer,et al.  Better Informed Training of Latent Syntactic Features , 2006, EMNLP.

[41]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[42]  Dan Klein,et al.  Two Languages are Better than One (for Syntactic Parsing) , 2008, EMNLP.

[43]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[44]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[45]  Mitchell P. Marcus,et al.  On the parameter space of generative lexicalized statistical parsing models , 2004 .

[46]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[47]  Brian Roark,et al.  Probabilistic Context-Free Grammar Induction Based on Structural Zeros , 2006, NAACL.

[48]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[49]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[50]  Stephan Vogel,et al.  An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT , 2007, NAACL.

[51]  Nianwen Xue,et al.  Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.

[52]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[53]  Dilek Z. Hakkani-Tür,et al.  Efficient sentence segmentation using syntactic features , 2008, 2008 IEEE Spoken Language Technology Workshop.

[54]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[55]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[56]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[57]  Dan Klein,et al.  Hierarchical Search for Parsing , 2009, HLT-NAACL.

[58]  Haizhou Li,et al.  K-Best Combination of Syntactic Parsers , 2009, EMNLP.

[59]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[60]  Joshua Goodman,et al.  Global Thresholding and Multiple-Pass Parsing , 1997, EMNLP.

[61]  David Chiang,et al.  Recovering Latent Information in Treebanks , 2002, COLING.

[62]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[63]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[64]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[65]  P. Ow,et al.  Filtered beam search in scheduling , 1988 .

[66]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[67]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[68]  Michael Collins,et al.  Morphology and Reranking for the Statistical Parsing of Spanish , 2005, HLT.

[69]  KHALIL SIMA’AN Computational Complexity of Probabilistic Disambiguation , 2002, Grammars.

[70]  Dan Klein,et al.  Learning Structured Models for Phone Recognition , 2007, EMNLP.

[71]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[72]  Dan Klein,et al.  Parsing German with Latent Variable Grammars , 2008 .

[73]  Frank Keller,et al.  Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French , 2005, ACL.

[74]  Ananth Sankar Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition , 2007 .

[75]  David Ellis,et al.  Multilevel Coarse-to-Fine PCFG Parsing , 2006, NAACL.

[76]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[77]  Guo-Hong Ding,et al.  One-Pass Coarse-to-Fine Segmental Speech Decoding Algorithm , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[78]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[79]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[80]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[81]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[82]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[83]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[84]  Rama Chellappa,et al.  Coarse-to-Fine Event Model for Human Activities , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[85]  Mark-Jan Nederhof,et al.  A General Technique to Train Language Models on Language Models , 2005, Computational Linguistics.

[86]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[87]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[88]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[89]  Roger K. Moore Computer Speech and Language , 1986 .

[90]  I. Dan Melamed,et al.  Statistical Machine Translation by Parsing , 2004, ACL.

[91]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[92]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[93]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[94]  François Brémond,et al.  Tracking multiple nonrigid objects in video sequences , 1998, IEEE Trans. Circuits Syst. Video Technol..

[95]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[96]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[97]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[98]  Jean-Luc Gauvain,et al.  Cross-lingual experiments with phone recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[99]  Petya Osenova,et al.  The BulTreeBank: Parsing and conversion , 2009 .

[100]  Cristina Bosco,et al.  Treebank Development: the TUT Approach , 2002 .

[101]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[102]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[103]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[104]  Amit Dubey,et al.  What to Do When Lexicalization Fails: Parsing German with Suffix Analysis and Smoothing , 2005, ACL.

[105]  Mark Johnson,et al.  Joint and Conditional Estimation of Tagging and Parsing Models , 2001, ACL.

[106]  Petya Osenova,et al.  Design and Implementation of the Bulgarian HPSG-based Treebank , 2004 .

[107]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[108]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[109]  Eugene Charniak,et al.  Learning Phrasal Categories , 2006, EMNLP.

[110]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[111]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[112]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[113]  Daniel Jurafsky,et al.  Shallow Semantc Parsing of Chinese , 2004, HLT-NAACL.

[114]  Dan Klein,et al.  Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing , 2008, EMNLP.

[115]  Geoffrey E. Hinton,et al.  Split and Merge EM Algorithm for Improving Gaussian Mixture Density Estimates , 2000, J. VLSI Signal Process..

[116]  John DeNero,et al.  A* Search via Approximate Factoring , 2007, AAAI.

[117]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[118]  Mary P. Harper,et al.  Self-Training PCFG Grammars with Latent Annotations Across Languages , 2009, EMNLP.

[119]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[120]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[121]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[122]  K. Vijay-Shankar,et al.  SOME COMPUTATIONAL PROPERTIES OF TREE ADJOINING GRAMMERS , 1985, ACL 1985.

[123]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[124]  PietraVincent J. Della,et al.  The mathematics of statistical machine translation , 1993 .