Effective statistical models for syntactic and semantic disambiguation

This thesis focuses on building effective statistical models for disambiguation of sophisticated syntactic and semantic natural language (NL) structures. We advance the state of the art in several domains by (i) choosing representations that encode domain knowledge more effectively and (ii) developing machine learning algorithms that deal with the specific properties of NL disambiguation tasks—sparsity of training data and large, structured spaces of hidden labels. For the task of syntactic disambiguation, we propose a novel representation of parse trees that connects the words of the sentence with the hidden syntactic structure in a direct way. Experimental evaluation on parse selection for a Head Driven Phrase Structure Grammar shows the new representation achieves superior performance compared to previous models. For the task of disambiguating the semantic role structure of verbs, we build a more accurate model, which captures the knowledge that the semantic frame of a verb is a joint structure with strong dependencies between arguments. We achieve this using a Conditional Random Field without Markov independence assumptions on the sequence of semantic role labels. To address the sparsity problem in machine learning for NL, we develop a method for incorporating many additional sources of information, using Markov chains in the space of words. The Markov chain framework makes it possible to combine multiple knowledge sources, to learn how much to trust each of them, and to chain inferences together. It achieves large gains in the task of disambiguating prepositional phrase attachments.

[1]  Rens Bod,et al.  Beyond Grammar: An Experience-Based Theory of Language , 1998 .

[2]  James Joseph Biundo,et al.  Analysis of Contingency Tables , 1969 .

[3]  Walter Daelemans,et al.  Resolving PP attachment Ambiguities with Memory-Based Learning , 1997, CoNLL.

[4]  Dan Roth,et al.  Semantic Role Labeling Via Generalized Inference Over Classifiers , 2004, CoNLL.

[5]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[6]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[7]  David R. Dowty Thematic proto-roles and argument selection , 1991 .

[8]  Daniel Gildea,et al.  The Necessity of Parsing for Predicate Argument Recognition , 2002, ACL.

[9]  Julia Hockenmaier,et al.  Data and models for statistical parsing with combinatory categorial grammar , 2003 .

[10]  K. Vijay-Shanker,et al.  Automated Extraction of TAGs from the Penn Treebank , 2000, IWPT.

[11]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[12]  Bernard Manderick,et al.  A Weighted Polynomial Information Gain Kernel for Resolving Prepositional Phrase Attachment Ambiguities with Support Vector Machines , 2003, IJCAI.

[13]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[14]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[17]  Jun'ichi Tsujii,et al.  Corpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank , 2004, IJCNLP.

[18]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[19]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[20]  Roger Levy,et al.  A Generative Model for Semantic Role Labeling , 2003, ECML.

[21]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[22]  Nello Cristianini,et al.  Kernels for structured data: strings, trees, etc. , 2004 .

[23]  David A. Hull Stemming algorithms: a case study for detailed evaluation , 1996 .

[24]  Michael I. Jordan,et al.  Link Analysis, Eigenvectors and Stability , 2001, IJCAI.

[25]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[26]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[27]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[28]  Christopher D. Manning,et al.  A Joint Model for Semantic Role Labeling , 2005, CoNLL.

[29]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[30]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[31]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[32]  David Chiang,et al.  Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[33]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[34]  Daniel Jurafsky,et al.  Semantic Role Labeling Using Different Syntactic Views , 2005, ACL.

[35]  Christopher D. Manning,et al.  Optimizing Local Probability Models for Statistical Parsing , 2003, ECML.

[36]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[37]  Jun'ichi Tsujii,et al.  Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing , 2005, ACL.

[38]  Phil Blunsom,et al.  Semantic Role Labelling with Tree Conditional Random Fields , 2005, CoNLL.

[39]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[40]  Yoram Singer,et al.  Boosting Applied to Tagging and PP Attachment , 1999, EMNLP.

[41]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[42]  Daniel M. Bikel,et al.  A Distributional Analysis of a Lexicalized Statistical Parsing Model , 2004, EMNLP.

[43]  Charles J. Fillmore,et al.  THE CASE FOR CASE. , 1967 .

[44]  Nianwen Xue,et al.  Calibrating Features for Semantic Role Labeling , 2004, EMNLP.

[45]  Dan Roth,et al.  The Necessity of Syntactic Parsing for Semantic Role Labeling , 2005, IJCAI.

[46]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[47]  Martha Palmer,et al.  The Integration of Syntactic Parsing and Semantic Role Labeling , 2005, CoNLL.

[48]  Jason Baldridge,et al.  Ensemble-based Active Learning for Parse Selection , 2004, NAACL.

[49]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[50]  Jason Baldridge,et al.  Active learning for HPSG parse selection , 2003, CoNLL.

[51]  Jun Suzuki,et al.  Hierarchical Directed Acyclic Graph Kernel: Methods for Structured Natural Language Data , 2003, ACL.

[52]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[53]  Julia Hockenmaier Parsing with Generative Models of Predicate-Argument Structure , 2003, ACL.

[54]  Volker Steinbiss,et al.  Cooccurrence smoothing for stochastic language modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[56]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[57]  Roger C. Schank,et al.  Conceptual dependency: A theory of natural language understanding , 1972 .

[58]  Roger Levy,et al.  Deep Dependencies from Context-Free Statistical Parsers: Correcting the Surface Dependency Approximation , 2004, ACL.

[59]  Stephan Oepen,et al.  Stochastic HPSG Parse Disambiguation using the Redwoods Corpus , 2005 .

[60]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[61]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[62]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[63]  Jason Eisner,et al.  Transformational Priors Over Grammars , 2002, EMNLP.

[64]  Mark Johnson,et al.  Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[65]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[66]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[67]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[68]  Sanda M. Harabagiu,et al.  Integrating Symbolic and Statistical Methods for Prepositional Phrase Attachment , 1999, FLAIRS Conference.

[69]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[70]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[71]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[72]  Daniel Gildea,et al.  Identifying Semantic Roles Using Combinatory Categorial Grammar , 2003, EMNLP.

[73]  Dan I. Moldovan,et al.  PP-attachment Disambiguation using Large Context , 2005, HLT.

[74]  Jason Baldridge,et al.  Active Learning and the Total Cost of Annotation , 2004, EMNLP.

[75]  Anne Greenbaum,et al.  Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.

[76]  Makoto Nagao,et al.  Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary , 1997, VLC.

[77]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[78]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[79]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[80]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[81]  Patrick Pantel,et al.  An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words , 2000, ACL.

[82]  Michael Collins,et al.  Parameter Estimation for Statistical Parsing Models: Theory and Practice of , 2001, IWPT.

[83]  Mark Steedman,et al.  Surface structure and interpretation , 1996, Linguistic inquiry.

[84]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[85]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[86]  John Odentrantz,et al.  Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[87]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[88]  Ronald M. Kaplan,et al.  The Interface between Phrasal and Functional Constraints , 1993, CL.

[89]  H. Alshawi,et al.  The Core Language Engine , 1994 .

[90]  Mark Johnson,et al.  Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[91]  Mark Steedman,et al.  Acquiring Compact Lexicalized Grammars from a Cleaner Treebank , 2002, LREC.

[92]  Stephan Oepen,et al.  Parse Disambiguation for a Rich HPSG Grammar , 2002 .

[93]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[94]  C. Fillmore FRAME SEMANTICS AND THE NATURE OF LANGUAGE * , 1976 .

[95]  Richard M. Schwartz,et al.  A Fully Statistical Approach to Natural Language Interfaces , 1996, ACL.

[96]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[97]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[98]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[99]  James R. Curran,et al.  Parsing the WSJ Using CCG and Log-Linear Models , 2004, ACL.

[100]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[101]  Mitchell P. Marcus,et al.  Smoothing a probablistic lexicon via syntactic transformations , 2001 .

[102]  C. R. Rao,et al.  Diversity: its measurement, decomposition, apportionment and analysis , 1982 .

[103]  Christina S. Leslie,et al.  Fast Kernels for Inexact String Matching , 2003, COLT.

[104]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[105]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[106]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[107]  Anne Lohrli Chapman and Hall , 1985 .

[108]  Xavier Carreras,et al.  Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling , 2004, CoNLL.

[109]  Eric Brill,et al.  A Rule-Based Approach to Prepositional Phrase Attachment Disambiguation , 1994, COLING.

[110]  Christopher D. Manning,et al.  The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection , 2004 .

[111]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[112]  Thorsten Brants,et al.  The LinGO Redwoods Treebank: Motivation and Preliminary Applications , 2002, COLING.

[113]  Mark Steedman,et al.  Generative Models for Statistical Parsing with Combinatory Categorial Grammar , 2002, ACL.

[114]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[115]  Daniel Jurafsky,et al.  Support Vector Learning for Semantic Argument Classification , 2005, Machine Learning.

[116]  Aravind K. Joshi,et al.  An SVM-based voting algorithm with application to parse reranking , 2003, CoNLL.

[117]  J. Bresnan Lexical-Functional Syntax , 2000 .

[118]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[119]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[120]  Christopher D. Manning,et al.  Joint Learning Improves Semantic Role Labeling , 2005, ACL.

[121]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[122]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[123]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[124]  Sanda M. Harabagiu,et al.  Using Predicate-Argument Structures for Information Extraction , 2003, ACL.

[125]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[126]  Daniel Jurafsky,et al.  Shallow Semantic Parsing using Support Vector Machines , 2004, NAACL.

[127]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[128]  Christopher D. Manning,et al.  Feature Selection for a Rich HPSG Grammar Using Decision Trees , 2002, CoNLL.