Labeling by landscaping: classifying tokens in context by pruning and decorating trees

State-of-the-art approaches to token labeling within text documents typically cast the problem either as a classification task, without using complex structural characteristics of the input, or as a sequential labeling task, carried out by a Conditional Random Field (CRF) classifier. Here we explore principled ways for structure to be brought to bear on the task. In line with recent trends in statistical learning of structured natural language input, we use a Support Vector Machine (SVM) classification framework deploying tree kernels. We then propose tree transformations and decorations, as a methodology for modeling complex linguistic phenomena in highly multi-dimensional feature spaces. We develop a general purpose tree engineering framework, which enables us to transcend the typically complex and laborious process of feature engineering. We build kernel based classifiers for two token labeling tasks: fine-grained event recognition, and lexical answer type detection in questions. For both, we show that in comparison with a corresponding linear kernel SVM, our method of using tree kernels improves recognition, thanks to appropriately engineering tree structures for use by the tree kernel. We also observe significant improvements when comparing with a CRF-based realization of structured prediction, itself performing at levels comparable to state-of-the-art.

[1]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[2]  Alessandro Moschitti,et al.  Special issue on statistical learning of natural language structured input and output , 2012, Nat. Lang. Eng..

[3]  Dan Roth,et al.  Learning question classifiers: the role of semantic information , 2005, Natural Language Engineering.

[4]  Branimir Boguraev,et al.  TimeML-Compliant Text Analysis for Temporal Reasoning , 2005, IJCAI.

[5]  Roberto Basili,et al.  Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification , 2007, ACL.

[6]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[7]  Roberto Basili,et al.  Structured Lexical Similarity via Convolution Kernels on Dependency Trees , 2011, EMNLP.

[8]  Roberto Basili,et al.  Tree Kernels for Semantic Role Labeling , 2008, CL.

[9]  Lipo Wang,et al.  Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing) , 2005 .

[10]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[11]  Rie Kubota Ando,et al.  Exploiting Unannotated Corpora for Tagging and Chunking , 2004, ACL.

[12]  Alessandro Moschitti,et al.  A Study on Convolution Kernels for Shallow Statistic Parsing , 2004, ACL.

[13]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[16]  Jian Su,et al.  Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel , 2006, NAACL.

[17]  Estela Saquete Boró,et al.  TimeML Events Recognition and Classification: Learning CRF Models with Semantic Roles , 2010, COLING.

[18]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[19]  Alessandro Moschitti,et al.  Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction , 2009, EMNLP.

[20]  James H. Martin,et al.  Identification of Event Mentions and their Semantic Class , 2006, EMNLP.

[21]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[22]  James Pustejovsky,et al.  Evita: A Robust Event Recognizer For QA Systems , 2005, HLT.

[23]  Siddharth Patwardhan,et al.  Question analysis: How Watson reads a clue , 2012, IBM J. Res. Dev..

[24]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[25]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[26]  Michael C. McCord,et al.  Slot Grammar: A System for Simpler Construction of Practical Natural Language Grammars , 1989, Natural Language and Logic.

[27]  Massimiliano Pontil,et al.  Support Vector Machines: Theory and Applications , 2001, Machine Learning and Its Applications.

[28]  Branimir Boguraev,et al.  Analysis of TimeBank as a Resource for TimeML Parsing , 2006, LREC.

[29]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[30]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.