Linguistic Structure Prediction

A major part of natural language processing now depends on the use of text data to build linguistic analyzers. We consider statistical, computational approaches to modeling linguistic structure. We seek to unify across many approaches and many kinds of linguistic structures. Assuming a basic understanding of natural language processing and/or machine learning, we seek to bridge the gap between the two fields. Approaches to decoding (i.e., carrying out linguistic structure prediction) and supervised and unsupervised learning of models that predict discrete structures as outputs are the focus. We also survey natural language processing problems to which these methods are being applied, and we address related topics in probabilistic inference, optimization, and experimental methodology. Table of Contents: Representations and Linguistic Data / Decoding: Making Predictions / Learning Structure from Annotated Data / Learning Structure from Incomplete Data / Beyond Decoding: Inference

[1]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[2]  Andrew McCallum,et al.  First-Order Probabilistic Models for Coreference Resolution , 2007, NAACL.

[3]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[4]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[5]  David A. Smith,et al.  Dependency Parsing by Belief Propagation , 2008, EMNLP.

[6]  Claire Cardie,et al.  Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis , 2008, EMNLP.

[7]  Dan Klein,et al.  Online EM for Unsupervised Models , 2009, NAACL.

[8]  Andreas Griewank,et al.  Automatic Differentiation of Algorithms: From Simulation to Optimization , 2000, Springer New York.

[9]  Azriel Rosenfeld,et al.  An introduction to algebraic structures , 1968 .

[10]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[11]  Giorgio Satta,et al.  On the Complexity of Non-Projective Data-Driven Dependency Parsing , 2007, IWPT.

[12]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[13]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[14]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[15]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[16]  Markus Dreyer,et al.  Graphical Models over Multiple Strings , 2009, EMNLP.

[17]  Mirella Lapata,et al.  Constraint-Based Sentence Compression: An Integer Programming Approach , 2006, ACL.

[18]  Alexander I. Rudnicky,et al.  Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[19]  J. Baker Trainable grammars for speech recognition , 1979 .

[20]  Dan Klein,et al.  Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing , 2008, EMNLP.

[21]  J. Lloyd Foundations of Logic Programming , 1984, Symbolic Computation.

[22]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[23]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[24]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[25]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[26]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[27]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[28]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[29]  Tadao Kasami,et al.  Generalized context-free grammars and multiple context-free grammars , 1989, Systems and Computers in Japan.

[30]  Noah A. Smith,et al.  Probabilistic Models of Nonprojective Dependency Trees , 2007, EMNLP.

[31]  Noah A. Smith,et al.  Annealing Structural Bias in Multilingual Weighted Grammar Induction , 2006, ACL.

[32]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[33]  Michael Collins,et al.  Hidden-Variable Models for Discriminative Reranking , 2005, HLT.

[34]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[35]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[36]  Peter Elias,et al.  A note on the maximum flow through a network , 1956, IRE Trans. Inf. Theory.

[37]  James Henderson Inducing History Representations for Broad Coverage Statistical Parsing , 2003, HLT-NAACL.

[38]  David Chiang,et al.  Recovering Latent Information in Treebanks , 2002, COLING.

[39]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[40]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[41]  Daniel Marcu,et al.  A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior , 2005, J. Mach. Learn. Res..

[42]  Ben Taskar,et al.  Word Alignment via Quadratic Assignment , 2006, NAACL.

[43]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[44]  Michael Collins,et al.  Efficient Third-Order Dependency Parsers , 2010, ACL.

[45]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[46]  Noah A. Smith,et al.  Computationally Efficient M-Estimation of Log-Linear Structure Models , 2007, ACL.

[47]  Mark-Jan Nederhof,et al.  Squibs and Discussions: Weighted Deductive Parsing and Knuth’s Algorithm , 2003, CL.

[48]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[49]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[50]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[51]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[52]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[53]  W. H. Day Computationally difficult parsimony problems in phylogenetic systematics , 1983 .

[54]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[55]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[56]  Jason Eisner,et al.  Parameter Estimation for Probabilistic Finite-State Transducers , 2002, ACL.

[57]  Sebastian Riedel,et al.  Incremental Integer Linear Programming for Non-projective Dependency Parsing , 2006, EMNLP.

[58]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[59]  Ronald Rosenfeld,et al.  Whole-sentence exponential language models: a vehicle for linguistic-statistical integration , 2001, Comput. Speech Lang..

[60]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[61]  Xavier Carreras,et al.  TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-Rich Parsing , 2008, CoNLL.

[62]  James R. Curran,et al.  Parsing the WSJ Using CCG and Log-Linear Models , 2004, ACL.

[63]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[64]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[65]  Vaibhava Goel,et al.  Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..

[66]  Dan Roth,et al.  Semantic Role Labeling Via Integer Linear Programming Inference , 2004, COLING.

[67]  Noah A. Smith,et al.  Joint Morphological and Syntactic Disambiguation , 2007, EMNLP.

[68]  Leslie G. Valiant,et al.  The Complexity of Computing the Permanent , 1979, Theor. Comput. Sci..

[69]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[70]  Dan Klein,et al.  Unsupervised Coreference Resolution in a Nonparametric Bayesian Model , 2007, ACL.

[71]  Noah A. Smith,et al.  Relative keyboard input system , 2008, IUI '08.

[72]  Markus Dreyer,et al.  Better Informed Training of Latent Syntactic Features , 2006, EMNLP.

[73]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[74]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[75]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[76]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[77]  Noah A. Smith,et al.  Good Question! Statistical Ranking for Question Generation , 2010, NAACL.

[78]  Regina Barzilay,et al.  Unsupervised Multilingual Learning for POS Tagging , 2008, EMNLP.

[79]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[80]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[81]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[82]  Slav Petrov,et al.  Coarse-to-Fine Natural Language Processing , 2011, Theory and Applications of Natural Language Processing.

[83]  Mike Wells,et al.  Structured Models for Fine-to-Coarse Sentiment Analysis , 2007, ACL.

[84]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[85]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[86]  Alexander M. Rush,et al.  On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing , 2010, EMNLP.

[87]  Shalom Lappin,et al.  Linguistic Nativism and the Poverty of the Stimulus , 2011 .

[88]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[89]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[90]  Richard Johansson,et al.  LTH: Semantic Structure Extraction using Nonprojective Dependency Trees , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[91]  Mark Johnson,et al.  Joint and Conditional Estimation of Tagging and Parsing Models , 2001, ACL.

[92]  Oscar Sánchez Siordia,et al.  Leukocyte Recognition Using EM-Algorithm , 2009, MICAI.

[93]  Dan Klein,et al.  Structure compilation: trading structure for features , 2008, ICML '08.

[94]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[95]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[96]  Edward Gibson,et al.  Representing Discourse Coherence: A Corpus-Based Study , 2005, CL.

[97]  Dan Klein,et al.  Simple Coreference Resolution with Rich Syntactic and Semantic Features , 2009, EMNLP.

[98]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[99]  Stuart M. Shieber,et al.  Prolog and Natural-Language Analysis , 1987 .

[100]  Robert E. Tarjan,et al.  Finding optimum branchings , 1977, Networks.

[101]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[102]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[103]  Solomon Eyal Shimony,et al.  Finding MAPs for Belief Networks is NP-Hard , 1994, Artif. Intell..

[104]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[105]  Noah A. Smith,et al.  Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization , 2010, ACL.

[106]  Ben Taskar,et al.  Learning associative Markov networks , 2004, ICML.

[107]  Raymond J. Mooney,et al.  Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus , 2007, ACL.

[108]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[109]  A. Wald Tests of statistical hypotheses concerning several parameters when the number of observations is large , 1943 .

[110]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[111]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[112]  Mark Johnson,et al.  Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[113]  Joshua Goodman,et al.  Language modeling for soft keyboards , 2002, IUI '02.

[114]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[115]  Noah A. Smith,et al.  Feature-Rich Translation by Quasi-Synchronous Lattice Parsing , 2009, EMNLP.

[116]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[117]  Adwait Ratnaparkhi,et al.  A Simple Introduction to Maximum Entropy Models for Natural Language Processing , 1997 .

[118]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[119]  Jun'ichi Tsujii,et al.  Evaluation and Extension of Maximum Entropy Models with Inequality Constraints , 2003, EMNLP.

[120]  Michael I. Jordan,et al.  An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators , 2008, ICML '08.

[121]  KHALIL SIMA’AN Computational Complexity of Probabilistic Disambiguation , 2002, Grammars.

[122]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[123]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[124]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[125]  Regina Barzilay,et al.  Bayesian Unsupervised Topic Segmentation , 2008, EMNLP.

[126]  Joshua Goodman,et al.  Parsing Algorithms and Metrics , 1996, ACL.

[127]  Christopher D. Manning,et al.  Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[128]  Yasemin Altun,et al.  Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger , 2006, EMNLP.

[129]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[130]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[131]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[132]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[133]  Claire Cardie,et al.  Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns , 2005, HLT.

[134]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[135]  Ming-Wei Chang,et al.  Driving Semantic Parsing from the World’s Response , 2010, CoNLL.

[136]  Micha Elsner,et al.  Structured Generative Models for Unsupervised Named-Entity Clustering , 2009, HLT-NAACL.

[137]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[138]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.

[139]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[140]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[141]  Noah A. Smith,et al.  Weighted and Probabilistic Context-Free Grammars Are Equally Expressive , 2007, CL.

[142]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[143]  Dan Klein,et al.  Two Languages are Better than One (for Syntactic Parsing) , 2008, EMNLP.

[144]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[145]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[146]  Yoram Singer,et al.  Online Learning Meets Optimization in the Dual , 2006, COLT.

[147]  David A. McAllester On the complexity analysis of static analyses , 2002, JACM.

[148]  David H. D. Warren,et al.  Parsing as Deduction , 1983, ACL.

[149]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[150]  Valentin I. Spitkovsky,et al.  Viterbi Training Improves Unsupervised Dependency Parsing , 2010, CoNLL.

[151]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[152]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[153]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[154]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[155]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[156]  Noah A. Smith,et al.  Novel estimation methods for unsupervised discovery of latent structure in natural language text , 2007 .

[157]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[158]  Raymond J. Mooney,et al.  A Statistical Semantic Parser that Integrates Syntax and Semantics , 2005, CoNLL.

[159]  Ted Pedersen,et al.  Empiricism Is Not a Matter of Faith , 2008, Computational Linguistics.

[160]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[161]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[162]  Mark Johnson,et al.  Parsing with Discontinuous Constituents , 1985, ACL.

[163]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[164]  Andrew McCallum,et al.  Collective Segmentation and Labeling of Distant Entities in Information Extraction , 2004 .

[165]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[166]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[167]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[168]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[169]  Francesco Maffioli,et al.  The k best spanning arborescences of a network , 1980, Networks.

[170]  Xavier Carreras,et al.  Structured Prediction Models via the Matrix-Tree Theorem , 2007, EMNLP.

[171]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[172]  Joshua Goodman,et al.  Semiring Parsing , 1999, CL.

[173]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[174]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[175]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[176]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[177]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[178]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[179]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[180]  Donald E. Knuth,et al.  A Generalization of Dijkstra's Algorithm , 1977, Inf. Process. Lett..

[181]  Eric P. Xing,et al.  Concise Integer Linear Programming Formulations for Dependency Parsing , 2009, ACL.

[182]  Dan Klein,et al.  Parsing and Hypergraphs , 2001, IWPT.

[183]  Ryan T. McDonald Discriminative Sentence Compression with Soft Syntactic Evidence , 2006, EACL.

[184]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[185]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[186]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[187]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[188]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[189]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[190]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[191]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[192]  Alex Acero,et al.  Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..

[193]  David Elworthy,et al.  Does Baum-Welch Re-estimation Help Taggers? , 1994, ANLP.

[194]  Stuart M. Shieber,et al.  Principles and Implementation of Deductive Parsing , 1994, J. Log. Program..

[195]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[196]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[197]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[198]  Joakim Nivre,et al.  Memory-Based Dependency Parsing , 2004, CoNLL.

[199]  Noah A. Smith,et al.  Bilingual Parsing with Factored Estimation: Using English to Parse Korean , 2004, EMNLP.

[200]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[201]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[202]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[203]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[204]  Richard Montague,et al.  The Proper Treatment of Quantification in Ordinary English , 1973 .

[205]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[206]  Phil Blunsom,et al.  Inducing Compact but Accurate Tree-Substitution Grammars , 2009, NAACL.

[207]  Pascal Denis,et al.  Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming , 2007, NAACL.

[208]  Richard M. Karp,et al.  Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[209]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[210]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[211]  Dan Klein,et al.  A Probabilistic Approach to Diachronic Phonology , 2007, EMNLP-CoNLL.

[212]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[213]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[214]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[215]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[216]  Noah A. Smith,et al.  Compiling Comp Ling: Weighted Dynamic Programming and the Dyna Language , 2005, HLT.

[217]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[218]  David Ellis,et al.  Multilevel Coarse-to-Fine PCFG Parsing , 2006, NAACL.

[219]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[220]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[221]  Daniel Marcu,et al.  Fast Decoding and Optimal Decoding for Machine Translation , 2001, ACL.

[222]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[223]  Ronald Rosenfeld,et al.  A survey of smoothing techniques for ME models , 2000, IEEE Trans. Speech Audio Process..

[224]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[225]  Yi Mao,et al.  Generalized isotonic conditional random fields , 2009, Machine Learning.

[226]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[227]  Phil Blunsom,et al.  A Discriminative Latent Variable Model for Statistical Machine Translation , 2008, ACL.

[228]  Noah A. Smith,et al.  Distributed Asynchronous Online Learning for Natural Language Processing , 2010, CoNLL.

[229]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[230]  Robert A. Kowalski,et al.  The early years of logic programming , 1988, CACM.

[231]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[232]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[233]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[234]  Detlef Prescher,et al.  Head-Driven PCFGs with Latent-Head Statistics , 2005, IWPT.

[235]  Eugene Charniak,et al.  Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, Comput. Linguistics.

[236]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[237]  Eugene Charniak,et al.  Edge-Based Best-First Chart Parsing , 1998, VLC@COLING/ACL.

[238]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[239]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[240]  David J. Weir,et al.  Characterizing Structural Descriptions Produced by Various Grammatical Formalisms , 1987, ACL.

[241]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[242]  Mark Hopkins,et al.  Cube Pruning as Heuristic Search , 2009, EMNLP.

[243]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[244]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[245]  Joshua Goodman,et al.  Global Thresholding and Multiple-Pass Parsing , 1997, EMNLP.

[246]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[247]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[248]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[249]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[250]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[251]  Jimmy J. Lin,et al.  Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.

[252]  John DeNero,et al.  Sampling Alignment Structure under a Bayesian Translation Model , 2008, EMNLP.

[253]  Mehryar Mohri,et al.  Semiring Frameworks and Algorithms for Shortest-Distance Problems , 2002, J. Autom. Lang. Comb..

[254]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[255]  Richard Sproat,et al.  The First International Chinese Word Segmentation Bakeoff , 2003, SIGHAN.

[256]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[257]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[258]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[259]  Claire Cardie,et al.  Identifying Anaphoric and Non-Anaphoric Noun Phrases to Improve Coreference Resolution , 2002, COLING.

[260]  Yi Lin,et al.  AN EFFECTIVE METHOD FOR HIGH-DIMENSIONAL LOG-DENSITY ANOVA ESTIMATION, WITH APPLICATION TO NONPARAMETRIC GRAPHICAL MODEL BUILDING , 2006 .

[261]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[262]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[263]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[264]  Fernando Pereira,et al.  Case-factor diagrams for structured probabilistic modeling , 2004, J. Comput. Syst. Sci..

[265]  Grzegorz Kondrak,et al.  A New Algorithm for the Alignment of Phonetic Sequences , 2000, ANLP.

[266]  Noah A. Smith,et al.  Annealing Techniques For Unsupervised Statistical Language Learning , 2004, ACL.

[267]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.