Learning Methods to Combine Linguistic Indicators:Improving Aspectual Classification and Revealing Linguistic Insights

Aspectual classification maps verbs to a small set of primitive categories in order to reason about time. This classification is necessary for interpreting temporal modifiers and assessing temporal relationships, and is therefore a required component for many natural language applications. A verb's aspectual category can be predicted by co-occurrence frequencies between the verb and certain linguistic modifiers. These frequency measures, called linguistic indicators, are chosen by linguistic insights. However, linguistic indicators used in isolation are predictively incomplete, and are therefore insufficient when used individually. In this article, we compare three supervised machine learning methods for combining multiple linguistic indicators for aspectual classification: decision trees, genetic programming, and logistic regression. A set of 14 indicators are combined for classification according to two aspectual distinctions. This approach improves the classification performance for both distinctions, as evaluated over unrestricted sets of verbs occurring across two corpora. This demonstrates the effectiveness of the linguistic indicators and provides a much-needed full-scale method for automatic aspectual classification. Moreover, the models resulting from learning reveal several linguistic insights that are relevant to aspectual classification. We also compare supervised learning methods with an unsupervised method for this task.

[1]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[2]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[3]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[4]  David R. Dowty,et al.  Word Meaning and Montague Grammar , 1979 .

[5]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[6]  Chung Hee Hwang,et al.  Interpreting Temporal Adverbials , 1993, HLT.

[7]  Diane J. Litman,et al.  Classifying Cue Phrases in Text and Speech Using Machine Learning , 1994, AAAI.

[8]  Eric V. Siegel Competitively evolving decision trees against fixed training cases for natural language processing , 1994 .

[9]  Alpha K. Luk Statistical Sense Disambiguation with Relatively Small Corpora Using Dictionary Definitions , 1995, ACL.

[10]  Salvatore J. Stolfo,et al.  Toward Multi-Strategy Parallel & Distributed Learning in Sequence Analysis , 1993, ISMB.

[11]  P. M. M. David M. W. Powers ThC,et al.  Machine Learning of Natural Language , 1989, Springer London.

[12]  Philip Resnik,et al.  Implicit Object Constructions and the (In)transitivity Continuum , 1997 .

[13]  Eric V. Siegel Learning Methods for Combining Linguistic Indicators to Classify Verbs , 1997, EMNLP.

[14]  D. G. Simpson,et al.  The Statistical Analysis of Discrete Data , 1989 .

[15]  Rebecca J. Passonneau,et al.  A Computational Model of the Semantics of Tense and Aspect , 1988, CL.

[16]  Chung Hee Hwang,et al.  Picking reference events from tense trees: a formal, implementable theory of English tense-aspect semantics , 1990 .

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[19]  Matthew Haines,et al.  Filling Knowledge Gaps in a Broad-Coverage Machine Translation System , 1995, IJCAI.

[20]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[21]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[22]  P. Resnik Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[23]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[24]  Mark Steedman,et al.  Temporal Ontology and Temporal Reference , 1988, CL.

[25]  Philip Resnik,et al.  Semantic Classes and Syntactic Ambiguity , 1993, HLT.

[26]  Martin Chodorow,et al.  Degrees of Stativity: The Lexical Representation of Verb Aspect , 1992, COLING.

[27]  P. Chan,et al.  Toward multistrategy parallel and distributed learning in sequence analysis. , 1993, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[28]  Bonnie J. Dorr,et al.  A Two-Level Knowledge Representation for Machine Translation: Lexical Semantics and Tense/Aspect , 1991, SIGLEX Workshop.

[29]  Kathleen R. McKeown,et al.  Linguistic indicators for language understanding: using machine learning methods to combine corpus-based indicators for aspectual classification of clauses , 1998 .

[30]  Vasileios Hatzivassiloglou,et al.  Towards the Automatic Identification of Adjectival Scales: Clustering Adjectives According to Meaning , 1993, ACL.

[31]  Janyce Wiebe,et al.  An Empirical Approach to Temporal Reference Resolution , 1997, EMNLP.

[32]  Franklin Allen,et al.  Using genetic algorithms to find technical trading rules , 1999 .

[33]  W. A. Tackett,et al.  The donut problem: scalability, generalization and breeding policies in genetic programming , 1994 .

[34]  Walter Alden Tackett,et al.  Genetic Programming for Feature Discovery and Image Discrimination , 1993, ICGA.

[35]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[36]  James F. Allen Natural language understanding (2nd ed.) , 1995 .

[37]  Claire Cardie,et al.  Improving Minority Class Prediction Using Case-Specific Feature Weights , 1997, ICML.

[38]  Eric V. Siegel Disambiguating Verbs with the WordNet Category of the Direct Object , 1998, WordNet@ACL/COLING.

[39]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[40]  Vasileios Hatzivassiloglou,et al.  A Quantitative Evaluation of Linguistic Tests for the Automatic Prediction of Semantic Markedness , 1995, ACL.

[41]  Kathleen R. McKeown,et al.  Automatic acquisition of lexical semantic knowledge from large corpora: the identification of semantically related words, markedness, polarity, and antonymy , 1998 .

[42]  P. McCullagh,et al.  The GLIM System, Release 3: Generalized linear interactive modelling , 1979 .

[43]  Kathleen McKeown,et al.  Gathering Statistics to Aspectually Classify Sentences with a Genetic Algorithm , 1996, ArXiv.

[44]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[45]  James Pustejovsky,et al.  The syntax of event structure , 1991, Cognition.

[46]  Ezra Black,et al.  An Experiment in Computational Discrimination of English Word Senses , 1988, IBM J. Res. Dev..

[47]  Kathleen McKeown,et al.  Emergent Linguistic Rules from inducing Decision Trees: Disambiguating Discourse Clue Words , 1994, AAAI.

[48]  Carla E. Brodley,et al.  Applying classification algorithms in practice , 1997, Stat. Comput..

[49]  Vidroha Debroy,et al.  Genetic Programming , 1998, Lecture Notes in Computer Science.

[50]  Eric V. Siegel Corpus-Based Linguistic Indicators for Aspectual Classification , 1999, ACL.

[51]  Salvatore J. Stolfo,et al.  JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[52]  Brij Masand,et al.  Optimizing confidence of text classification by evolution of symbolic expressions , 1994 .

[53]  Michael C. McCord,et al.  Slot Grammar: A System for Simpler Construction of Practical Natural Language Grammars , 1989, Natural Language and Logic.

[54]  Marc Moens,et al.  Algorithms for Analysing the Temporal Structure of Discourse , 1995, EACL.

[55]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[56]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[57]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[58]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[59]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.