Mining answers in German Web pages

We present a novel method for mining textual answers in German Web pages using semistructured NL questions and Google for initial document retrieval. We exploit the redundancy on the Web by weighting all identified named entities (NEs) found in the relevant document set based on their occurrences and distributions. The ranked NEs are used as our primary anchors for document indexing, paragraph selection, and answer identification. The latter is dependent on two factors: the overlap of terms at different levels (e.g., tokens and named entities) between queries and sentences, and the relevance of identified NEs corresponding to the expected answer type. The set of answer candidates is further subdivided into ranked equivalent classes from which the final answer is selected. The system has been evaluated using question-answer pairs extracted from a popular German quiz book.

[1]  Harris Wu,et al.  Probabilistic question answering on the web , 2002, WWW '02.

[2]  Mats Rooth,et al.  Looking Under the Hood : Tools for Diagnosing your Question Answering Engine , 2001, ACL 2001.

[3]  Gideon S. Mann,et al.  Analyses for elucidating current question answering technology , 2001, Natural Language Engineering.

[4]  Michael Collins,et al.  Answer Extraction , 2000, ANLP.

[5]  Steffen Staab,et al.  Bootstrapping an Ontology-Based Information Extraction System , 2003, Intelligent Exploration of the Web.

[6]  Jimmy J. Lin The Web as a Resource for Question Answering: Perspectives and Challenges , 2002, LREC.

[7]  Günter Neumann,et al.  A Shallow Text Processing Core Engine , 2002, Comput. Intell..

[8]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[9]  Zhiping Zheng,et al.  AnswerBus question answering system , 2002 .

[10]  Witold Abramowicz Knowledge-Based Information Retrieval and Filtering from the Web , 2003 .

[11]  Charles L. A. Clarke,et al.  Exploiting redundancy in question answering , 2001, SIGIR '01.

[12]  Sanda M. Harabagiu,et al.  Answering Complex, List and Context Questions with LCC's Question-Answering Server , 2001, TREC.

[13]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[14]  Boris Katz,et al.  From Sentence Processing to Information Access on the World Wide Web , 1997 .

[15]  Janusz Kacprzyk,et al.  Intelligent Exploration of the Web , 2003, Studies in Fuzziness and Soft Computing.