A Case-Based Approach to Knowledge Acquisition for Domain-Specific Sentence Analysis

This paper describes a case-based approach to knowledge acquisition for natural language systems that simultaneously learns part of speech, word sense, and concept activation knowledge for all open class words in a corpus. The parser begins with a lexicon of function words and creates a case base of context-sensitive word definitions during a humansupervised training phase. Then, given an unknown word and the context in which it occurs, the parser retrieves definitions from the case base to infer the word's syntactic and semantic features. By encoding context as part of a definition, the meaning of a word can change dynamically in response to surrounding phrases without the need for explicit lexical disambiguation heuristics. Moreover, the approach acquires all three classes of knowledge using the same case representation and requires relatively little training and no hand-coded knowledge acquisition heuristics. We evaluate it in experiments that explore two of many practical applications of the technique and conclude that the case-based method provides a promising approach to automated dictionary construction and knowledge acquisition for sentence analysis in limited domains. In addition, we present a novel case retrieval algorithm that uses decision trees to improve the performance of a k-nearest neighbor similarity metric.

[1]  Richard Granger,et al.  FOUL-UP: A Program that Figures Out Meanings of Words from Context , 1977, IJCAI.

[2]  Alan Bundy,et al.  Proceedings of the Eighth International Joint Conference on Artificial Intelligence , 1983 .

[3]  Robert C. Berwick,et al.  Learning Word Meanings From Examples , 1983, IJCAI.

[4]  Mallory Selfridge A Computer Model of Child Language Learning , 1986, Artif. Intell..

[5]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[6]  林 良彦,et al.  Acquiring Lexical Knowledge from Text : A Case Study , 1989 .

[7]  W. Lehnert Symboliccsubsymbolic Sentence Analysis: Exploiting the Best of Two W Orlds 1 , 1990 .

[8]  R. Wilensky Extending the Lexicon by Exploiting Subregularities , 1990, COLING.

[9]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[10]  Uri Zernik,et al.  Lexical acquisition: Exploiting on-line resources to build a lexicon. , 1991 .

[11]  Steven L. Lytinen,et al.  Learning Words From Context , 1991, ML.

[12]  Michael R. Brent,et al.  Automatic Acquisition of Subcategorization Frames from Tagged Text , 1991, HLT.

[13]  Claire Cardie,et al.  University of Massachusetts: MUC-3 test results and analysis , 1991, MUC.

[14]  Claire Cardie,et al.  University of Massachusetts: Description of the CIRCUS System as Used for MUC-3 , 1991, MUC.

[15]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[16]  Philip Resnik A Class-Based Approach to Lexical Discovery , 1992, ACL.

[17]  Claire Cardie,et al.  University of Massachusetts: Description of the CIRCUS System as Used for MUC-4 , 1992, MUC.

[18]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[19]  Claire Cardie,et al.  Learning to Disambiguate Relative Pronouns , 1992, AAAI.

[20]  Gregory Grefenstette,et al.  SEXTANT: Exploring Unexplored Contexts for Semantic Extraction from Syntactic Analysis , 1992, ACL.

[21]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.