Syntactic Parsing as a Knowledge Acquisition Problem

Corpus linguistics involves the construction and annotation of large databases of text from spoken and written language. These have applications in NLP and taught grammar. This annotation represents the problem of the KA “bottleneck” in a new application area. This paper introduces parse checking as a KA problem, and compares it to other tree-oriented KA methodologies such as laddering and clustering. It argues that corpus linguistics represents a significant application area for KA. The laddering tools discussed here have been used to process thousands of tree structures. The paper compares two tools in use on the ICE-GB corpus. One tool, ICE Tree II, exploits the structure of grammatical trees more fully than the other. Timing results show a main learning effect which dominates any difference comparison. However, the more integrated tool reduces the scope for error.

[1]  Nigel Shadbolt,et al.  Knowledge Discovery in Databases: Exploiting Knowledge-Level Redescription , 1996, EKAW.

[2]  Nigel Shadbolt,et al.  Advances in Knowledge Acquisition , 1996, Lecture Notes in Computer Science.

[3]  Brian R. Gaines,et al.  Current Trends in Knowledge Acquisition , 1990 .

[4]  Patrick Henry Winston,et al.  The psychology of computer vision , 1976, Pattern Recognit..

[5]  Sidney Greenbaum,et al.  A new corpus of English: ICE , 1992 .

[6]  Raymond Reiter,et al.  On Inheritance Hierarchies With Exceptions , 1983, AAAI.

[7]  Nigel Shadbolt,et al.  Laddering: technique and tool use in knowledge acquisition , 1994 .

[8]  Han Reichgelt,et al.  ALTO: An automated laddering tool. , 1990 .

[9]  Sidney Greenbaum,et al.  The Oxford English Grammar , 1996 .

[10]  Jan Svartvik,et al.  Directions in corpus linguistics : proceedings of Nobel Symposium 82, Stockholm, 4-8 August 1991 , 1992 .

[11]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12]  Sean Wallis,et al.  Machine Learning with Knowledge , 1993 .

[13]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[14]  Sidney Greenbaum,et al.  Comparing English worldwide : the International Corpus of English , 1996 .

[15]  David H. Jonassen,et al.  Structural Knowledge: Techniques for Representing, Conveying, and Acquiring Structural Knowledge , 1993 .

[16]  Stig Johansson,et al.  English computer corpora : selected papers and research guide , 1991 .

[17]  Tony G. Rose,et al.  Extracting Conceptual Knowledge From Text Using Explicit Relation Markers , 1996, EKAW.

[18]  Nigel Shadbolt,et al.  CNN: Integrating Knowledge Elicitation With a Machine Learning Technique , 1992 .