Learning by Reading: A Prototype System, Performance Baseline and Lessons Learned

A traditional goal of Artificial Intelligence research has been a system that can read unrestricted natural language texts on a given topic, build a model of that topic and reason over the model. Natural Language Processing advances in syntax and semantics have made it possible to extract a limited form of meaning from sentences. Knowledge Representation research has shown that it is possible to model and reason over topics in interesting areas of human knowledge. It is useful for these two communities to reunite periodically to see where we stand with respect to the common goal of text understanding. In this paper, we describe a coordinated effort among researchers from the Natural Language and Knowledge Representation and Reasoning communities. We routed the output of existing NL software into existing KR software to extract knowledge from texts for integration with engineered knowledge bases. We tested the system on a suite of roughly 80 small English texts about the form and function of the human heart, as well as a handful of "confuser" texts from other domains. We then manually evaluated the knowledge extracted from novel texts. Our conclusion is that the technology from these fields is mature enough to start producing unified machine reading systems. The results of our exercise provide a performance baseline for systems attempting to acquire models from text.

[1]  Xavier Carreras,et al.  Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling , 2004, CoNLL.

[2]  James Fan,et al.  Interpreting Loosely Encoded Questions , 2004, AAAI.

[3]  Ken Barker,et al.  Indirect anaphora resolution as semantic path search , 2005, K-CAP '05.

[4]  Jerry R. Hobbs,et al.  Learning by Reading: Two Experiments , 2006 .

[5]  Raymond J. Mooney,et al.  Learning Parse and Translation Decisions from Examples with Rich Context , 1997, ACL.

[6]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[7]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[8]  Raymond J. Mooney,et al.  A Statistical Semantic Parser that Integrates Syntax and Semantics , 2005, CoNLL.

[9]  Peter Clark,et al.  A library of generic concepts for composing knowledge bases , 2001, K-CAP '01.

[10]  Eduard H. Hovy Learning by Reading: An Experiment in Text Analysis , 2006, TSD.

[11]  Bruce W. Porter,et al.  Controlling Search for the Consequences of New Information During Knowledge Integration , 1989, ML.

[12]  Adam Kilgarriff,et al.  Introduction to the special issue on evaluating word sense disambiguation systems , 2002, Natural Language Engineering.

[13]  Ulf Hermjakob,et al.  Parsing and Question Classification for Question Answering , 2001, ACL 2001.

[14]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[15]  Martin Romacker,et al.  Creating Knowledge Repositories from Biomedical Reports: The MEDSYNDIKATE Text Mining System , 2001, Pacific Symposium on Biocomputing.

[16]  Jerry R. Hobbs,et al.  Learning from Reading Syntactically Complex Biology Texts , 2007, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[17]  Jerry R. Hobbs,et al.  Interpretation as Abduction , 1993, Artif. Intell..

[18]  Peter Z. Yeh,et al.  Matching utterances to rich knowledge structures to acquire a model of the speaker's goal , 2005, K-CAP '05.

[19]  Kenneth D. Forbus,et al.  A Prototype System that Learns by Reading Simplified Texts , 2007, AAAI Spring Symposium: Machine Reading.