A knowledge-based search engine powered by wikipedia

This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC HARD track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers.

[1]  Ian H. Witten,et al.  Extracting corpus specific knowledge bases from Wikipedia , 2007 .

[2]  James Allan,et al.  HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[3]  Evgeniy Gabrilovich,et al.  Feature Generation for Text Categorization Using World Knowledge , 2005, IJCAI.

[4]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[5]  Ian Ruthven,et al.  Re-examining the potential effectiveness of interactive query expansion , 2003, SIGIR.

[6]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[7]  Ian H. Witten,et al.  Mining Domain-Specific Thesauri from Wikipedia: A Case Study , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[8]  Evgeniy Gabrilovich,et al.  Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[9]  Justin Zobel,et al.  Questioning Query Expansion: An Examination of Behaviour and Parameters , 2004, ADC.

[10]  Dave Crane,et al.  Ajax in Action , 2005 .

[11]  Ali Shiri,et al.  Query expansion behavior within a thesaurus-enhanced search environment: A user-centered evaluation , 2006, J. Assoc. Inf. Sci. Technol..

[12]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[13]  Jane Greenberg,et al.  Automatic query expansion via lexical-semantic relationships , 2001, J. Assoc. Inf. Sci. Technol..

[14]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[15]  Yiyu Yao,et al.  Conceptual Query Expansion , 2005, AWIC.

[16]  Fei Song,et al.  Knowledge-Based Approaches to Query Expansion in Information Retrieval , 1996, Canadian Conference on AI.

[17]  Takenobu Tokunaga,et al.  Combining multiple evidence from different types of thesaurus for query expansion , 1999, SIGIR '99.

[18]  Jane Greenberg Automatic query expansion via lexical-semantic relationships , 2001, J. Assoc. Inf. Sci. Technol..

[19]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[20]  James R. Curran,et al.  Improvements in Automatic Thesaurus Extraction , 2002, ACL 2002.

[21]  Ali Shiri,et al.  Usability and user perceptions of a thesaurus-enhanced search interface , 2005, J. Documentation.

[22]  Ali Shiri,et al.  Query expansion behavior within a thesaurus-enhanced search environment: A user-centered evaluation , 2006 .

[23]  James Allan,et al.  HARD Track Overview in TREC 2004 (Notebook) High Accuracy Retrieval from Documents , 2004 .