Inferring parts of speech for lexical mappings via the Cyc KB

We present an automatic approach to learning criteria for classifying the parts-of-speech used in lexical mappings. This will further automate our knowledge acquisition system for non-technical users. The criteria for the speech parts are based on the types of the denoted terms along with morphological and corpus-based clues. Associations among these and the parts-of-speech are learned using the lexical mappings contained in the Cyc knowledge base as training data. With over 30 speech parts to choose from, the classifier achieves good results (77.8% correct). Accurate results (93.0%) are achieved in the special case of the mass-count distinction for nouns. Comparable results are also obtained using OpenCyc (73.1% general and 88.4% mass-count).

[1]  Harry Bunt,et al.  Mass Terms and Model-Theoretic Semantics , 1985 .

[2]  Ted Pedersen,et al.  Lexical Acquisition via Constraint Solving , 1995, ArXiv.

[3]  G. Pullum,et al.  The Cambridge Grammar of the English Language , 2002 .

[4]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[5]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Janyce Wiebe,et al.  Mapping Collocational Properties into Machine Learning Features , 1998, VLC@COLING/ACL.

[8]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[9]  Janyce Wiebe,et al.  Decomposable Modeling in Natural Language Processing , 1999, CL.

[10]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[11]  Francis Bond,et al.  Using an Ontology to Determine English Countability , 2002, COLING.

[12]  William A. Woods,et al.  Aggressive Morphology for Robust Lexical Coverage , 2000, ANLP.

[13]  Peter Wagner,et al.  Inducing criteria for mass noun lexical mappings using the Cyc KB, and its extension to WordNet , 2003 .

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[15]  Paul Procter,et al.  Cambridge international dictionary of English , 2000 .

[16]  Sergei Nirenburg,et al.  A lexicon for knowledge-based MT , 1995, Machine Translation.

[17]  Lane Schwartz,et al.  Corpus-based acquisition of head noun countability features , 2002 .

[18]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[19]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[20]  Janine Toole Categorizing Unknown Words: Using Decision Trees to Identify Names and Misspellings , 2000, ANLP.

[21]  Anthony R. Davis,et al.  Building and Maintaining a Semantically Adequate Lexicon Using Cyc , 1999 .

[22]  Timothy Baldwin,et al.  Learning the Countability of English Nouns from Corpus Data , 2003, ACL.

[23]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .