Extending Knowledge and Deepening Linguistic Processing for the Question Answering System InSicht

The German question answering (QA) system InSicht participated in [email protected] for the second time. It relies on complete sentence parsing, inferences, and semantic representation matching. This year, the system was improved in two main directions. First, the background knowledge was extended by large semantic networks and large rule sets. Second, linguistic processing was deepened by treating a phenomenon that appears prominently on the level of text semantics: coreference resolution. A new source of lexico-semantic relations and equivalence rules has been established based on compound analyses from document parses. These analyses were used in three ways: to project lexico-semantic relations from compound parts to compounds, to establish a subordination hierarchy for compounds, and to derive equivalence rules between nominal compounds and their analytic counterparts. The lack of coreference resolution in InSicht was one major source of missing answers in [email protected] 2004. Therefore the coreference resolution module CORUDIS was integrated into the parsing during document processing. The central step in the QA system InSicht, matching semantic networks derived from the question parse (one by one) with document sentence networks, was generalized. Now, a question network can be split at certain semantic relations (e.g. relations for local or temporal specifications). To evaluate the different extensions, the QA system was run on all 400 German questions from [email protected] 2004 and 2005 with varying setups. Some extensions showed positive effects, but currently they are minor and not statistically significant. The paper ends with a discussion why improvements are not larger, yet.

[1]  Sven Hartrumpf,et al.  Using Semantic Networks for Geographic Information Retrieval , 2005, CLEF.

[2]  Carol Peters,et al.  Multilingual Information Access for Text, Speech and Images, 5th Workshop of the Cross-Language Evaluation Forum, CLEF 2004, Bath, UK, September 15-17, 2004, Revised Selected Papers , 2005, CLEF.

[3]  Sven Hartrumpf,et al.  The semantically based computer lexicon HaGenLex. Structure and technological environment , 2003 .

[4]  Hermann Helbig,et al.  Knowledge Representation and the Semantics of Natural Language , 2005, Cognitive Technologies.

[5]  Sven Hartrumpf,et al.  Coreference resolution with syntactico-semantic rules and corpus statistics , 2001, CoNLL.

[6]  Sven Hartrumpf,et al.  Hybrid disambiguation in natural language analysis , 2003 .

[7]  Hermann Helbig Knowledge Representation and the Semantics of Natural Language (Cognitive Technologies) , 2005 .

[8]  Lynette Hirschman,et al.  Appendix F: MUC-7 Coreference Task Definition (version 3.0) , 1998, MUC.

[9]  Dmitry Zelenko,et al.  Coreference Resolution for Information Extraction , 2004 .

[10]  Fredric C. Gey,et al.  Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers , 2006, CLEF.

[11]  Gilad Mishne,et al.  Making Stone Soup: Evaluating a Recall-Oriented Multi-stream Question Answering System for Dutch , 2004, CLEF.

[12]  M. Felisa Verdejo,et al.  Question Answering Pilot Task at CLEF 2004 , 2004, CLEF.

[13]  Sven Hartrumpf Question Answering using Sentence Parsing and Semantic Network Matching , 2004, CLEF.