Large Scale Experiments with Naive Bayes and Decision Trees for Function Tagging

This paper describes the use of two machine learning techniques, naive Bayes and decision trees, to address the task of assigning function tags to nodes in a syntactic parse tree. Function tags are extra functional information, such as logical subject or predicate, that can be added to certain nodes in syntactic parse trees. We model the function tags assignment problem as a classification problem. Each function tag is regarded as a class and the task is to find what class/tag a given node in a parse tree belongs to from a set of predefined classes/tags. The paper offers the first systematic comparison of the two techniques, naive Bayes and decision trees, for the task of function tags assignment. The comparison is based on a standardized data set.

[1]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[2]  Sanda M. Harabagiu,et al.  The Structure and Performance of an Open-Domain Question Answering System , 2000, ACL.

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[4]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[5]  Josef Ruppenhofer,et al.  FrameNet II: Extended theory and practice , 2006 .

[6]  Eugene Charniak,et al.  Assigning Function Tags to Parsed Text , 2000, ANLP.

[7]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[8]  Vasile Rus,et al.  Assigning Function Tags with a Simple Model , 2005, CICLing.

[9]  Mark Johnson,et al.  A Simple Pattern-matching Algorithm for Recovering Empty Nodes and their Antecedents , 2002, ACL.

[10]  Eugene Charniak,et al.  Function tagging , 2004 .

[11]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[13]  Wojciech Skut,et al.  Tagging Grammatical Functions , 1997, EMNLP.

[14]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[15]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[16]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[17]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[18]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[19]  Valentin Jijkoun,et al.  Enriching the Output of a Parser Using Memory-based Learning , 2004, ACL.

[20]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[21]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[22]  Ann Bies,et al.  Bracketing Guidelines for Treebank II Style , 2002 .

[23]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.

[24]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[25]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[26]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[27]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[28]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[29]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.