论文信息 - Inferring parts of speech for lexical mappings via the Cyc KB - 字舞流文

Inferring parts of speech for lexical mappings via the Cyc KB

We present an automatic approach to learning criteria for classifying the parts-of-speech used in lexical mappings. This will further automate our knowledge acquisition system for non-technical users. The criteria for the speech parts are based on the types of the denoted terms along with morphological and corpus-based clues. Associations among these and the parts-of-speech are learned using the lexical mappings contained in the Cyc knowledge base as training data. With over 30 speech parts to choose from, the classifier achieves good results (77.8% correct). Accurate results (93.0%) are achieved in the special case of the mass-count distinction for nouns. Comparable results are also obtained using OpenCyc (73.1% general and 88.4% mass-count).

Michael J. Witbrock | Dave Schneider | Tom O'Hara | Stefano Bertolo | Bjørn Aldag | Jon Curtis | Kathy Panton | Nancy Salay | M. Witbrock | S. Bertolo | Tom O'Hara | David Schneider | Kathy Panton | Nancy Salay | Bjørn Aldag | Jon Curtis

[1] Harry Bunt,et al. Mass Terms and Model-Theoretic Semantics , 1985 .

[2] Ted Pedersen,et al. Lexical Acquisition via Constraint Solving , 1995, ArXiv.

[3] G. Pullum,et al. The Cambridge Grammar of the English Language , 2002 .

[4] Kenneth Ward Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[5] Douglas B. Lenat,et al. CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[6] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[7] Janyce Wiebe,et al. Mapping Collocational Properties into Machine Learning Features , 1998, VLC@COLING/ACL.

[8] Eric Brill,et al. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[9] Janyce Wiebe,et al. Decomposable Modeling in Natural Language Processing , 1999, CL.

[10] Alexander Clark,et al. Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[11] Francis Bond,et al. Using an Ontology to Determine English Countability , 2002, COLING.

[12] William A. Woods,et al. Aggressive Morphology for Robust Lexical Coverage , 2000, ANLP.

[13] Peter Wagner,et al. Inducing criteria for mass noun lexical mappings using the Cyc KB, and its extension to WordNet , 2003 .

[14] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[15] Paul Procter,et al. Cambridge international dictionary of English , 2000 .

[16] Sergei Nirenburg,et al. A lexicon for knowledge-based MT , 1995, Machine Translation.

[17] Lane Schwartz,et al. Corpus-based acquisition of head noun countability features , 2002 .

[18] Wendy G. Lehnert,et al. Information extraction , 1996, CACM.

[19] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[20] Janine Toole. Categorizing Unknown Words: Using Decision Trees to Identify Names and Misspellings , 2000, ANLP.

[21] Anthony R. Davis,et al. Building and Maintaining a Semantically Adequate Lexicon Using Cyc , 1999 .

[22] Timothy Baldwin,et al. Learning the Countability of English Nouns from Corpus Data , 2003, ACL.

[23] Jan Svartvik,et al. A __ comprehensive grammar of the English language , 1988 .

[24] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .