Wissen gewinnen durch Wissen : Ontologiebasierte Informationsextraktion

This article reports on ontology use for an automatic summarization that "goes the human way". The idea behind it is that human summary users can comprehend and integrate automatic summaries more easily if they and the automatic summarizer share summarization principles and practices. Our currentfirst real-world application is in bone marrow transplantation (BMT). In the core of the SummIt-BMT system, a domain ontology in a MySQL database provides knowledge for human users and system components. SummIt-BMT supports query formulation through an empirically founded scenario interface. Incoming retrieval results are pre-selected by a text retrieval component and submitted to agents reflecting summarization strategies of competent humans. The agents choose from the text passage retrieval result the sentences that best fit the user question as evidenced by ontology propositions occurring in them. The relevant text clips are entered into the answer version of the question scenario and presented with links to their home positions in the source documents. Summarization and information extraction is ontology-based. It uses the relatively well-defined concepts for objects and properties and finds evidence for relations between them with the help of paraphrases. Discussion concentrates on the ontology and its use for information extraction and question answering / summarization. The system agents are heavy users of the ontology. They typically fetch and combine different types of knowledge from the ontology database: concepts, propositions and their semanto-syntactic schemes, unifiers, paraphrases and query scenario forms. The main achievement of the agents is to keep only text retrieval results that meet user question propositions not only by individual concepts, but also by related units corresponding to phrases or sentences. Our first results are presented in the final section of the paper. They are not yet excellent, but quite good for a start-up team of agents and an ontology that is open for improvement.

[1]  W. Kintsch,et al.  Strategies of discourse comprehension , 1986 .

[2]  Paul Buitelaar,et al.  Ontology-based Information Extraction with SOBA , 2006, LREC.

[3]  Regina Barzilay,et al.  Paraphrasing for Automatic Evaluation , 2006, NAACL.

[4]  Elizabeth D. Liddy,et al.  Advances in Automatic Text Summarization , 2001, Information Retrieval.

[5]  Panagiotis Stamatopoulos,et al.  Summarization from Medical Documents: A Survey , 2005, Artif. Intell. Medicine.

[6]  Satoshi Sekine,et al.  Paraphrase Acquisition for Information Extraction , 2003, IWP@ACL.

[7]  Gary Marchionini,et al.  Design of Interfaces for Information Seeking. , 1998 .

[8]  Walter Kintsch,et al.  Comprehension: A Paradigm for Cognition , 1998 .

[9]  Jerry R. Hobbs,et al.  Interpretation as Abduction , 1993, Artif. Intell..

[10]  Ido Dagan,et al.  Investigating a Generic Paraphrase-Based Approach for Relation Extraction , 2006, EACL.

[11]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part I. Background and Theory , 1997, J. Documentation.

[12]  Brigitte Endres-Niggemeyer,et al.  Summarizing information , 1998 .

[13]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[14]  Geoffrey Sampson,et al.  The Oxford Handbook of Computational Linguistics , 2003, Lit. Linguistic Comput..

[15]  Graeme Hirst,et al.  Near-Synonymy and Lexical Choice , 2002, CL.

[16]  C. Brodsky The Discovery of Grounded Theory: Strategies for Qualitative Research , 1968 .

[17]  W. Kintsch The role of knowledge in discourse comprehension: a construction-integration model. , 1988, Psychological review.

[18]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[19]  Diana Maynard,et al.  Ontology-based information extraction for market monitoring and technology watch , 2005 .

[20]  Hamish Cunningham,et al.  Information Extraction, Automatic , 2006 .