Query Log Analysis for Adaptive Dialogue-Driven Search

The theme of this chapter is the improvement of Information Retrieval and Question Answering systems by the analysis of query logs. Two case studies are discussed. The first describes an intranet search engine working on a university campus which can present sophisticated query modifications to the user. It does this via a hierarchical domain model built using multi-word term co-occurrence data. The usage log was analysed using mutual information scores between a query and its refinement, between a query and its replacement, and between two queries occurring in the same session. The results can be used to validate refinements in the domain model, and to suggest replacements such as domain-dependent spell ing corrections. The second case study describes a dialogue-based question answering system working over a closed document collection largely derived from the Web. Logs here are based around explicit sessions in which an analyst interacts with the system. Analysis of the logs has shown that certain types of interaction lead to increased precision of the results. Future versions of the system will encourage these forms of interaction. The conclusions of this chapter are firstly that there is a growing literature on query log analysis, much of it reviewed here, secondly that logs provide many forms of useful information for improving a system, and thirdly that mutual information measures taken with automatic term recognition algorithms and hierarchy construction techniques comprise one approach for enhancing system performance.

[1]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[2]  Robert Dale Industry Watch , 2003, Nat. Lang. Eng..

[3]  Volker Steinbiss,et al.  The Philips automatic train timetable information system , 1995, Speech Commun..

[4]  Charles L. A. Clarke,et al.  Comparing query logs and pseudo-relevance feedbackfor web-search query refinement , 2007, SIGIR.

[5]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[6]  Jean Caelen,et al.  DQR test suites for a qualitative evaluation of spoken dialogue systems: from speech understanding to dialogue strategy , 1998 .

[7]  Dick Stenmark One Week with a Corporate Search Engine: A Time Based Analysis of Intranet Information Seeking , 2005, AMCIS.

[8]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[9]  W. Bruce Croft,et al.  Generating hierarchical summaries for web searches , 2003, SIGIR '03.

[10]  Udo Kruschwitz Intelligent Document Retrieval: Exploiting Markup Structure (The Information Retrieval Series) , 2005 .

[11]  Tomek Strzalkowski,et al.  HITIQA: High-quality intelligence through interactive question answering , 2009, Nat. Lang. Eng..

[12]  Maarten de Rijke,et al.  The Multiple Language Question Answering Track at CLEF 2003 , 2003, CLEF.

[13]  Udo Kruschwitz,et al.  Intelligent Document Retrieval - Exploiting Markup Structure , 2005, The Springer International Series on Information Retrieval.

[14]  Ken Samuel,et al.  Dialogue Act Tagging with Transformation-Based Learning , 1998, ACL.

[15]  Marilyn A. Walker,et al.  Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.

[16]  Chung Hee Hwang,et al.  The TRAINS project: a case study in building a conversational planning agent , 1994, J. Exp. Theor. Artif. Intell..

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[19]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[20]  Boris Motik,et al.  An infrastructure for searching, reusing and evolving distributed ontologies , 2003, WWW '03.

[21]  Amanda Spink,et al.  Web searching on the Vivisimo search engine , 2006, J. Assoc. Inf. Sci. Technol..

[22]  Peiling Wang,et al.  Mining longitudinal web queries: Trends and patterns , 2003, J. Assoc. Inf. Sci. Technol..

[23]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[24]  Yorick Wilks,et al.  Dialogue Act Classification Based on Intra-Utterance Features∗ , 2005 .

[25]  David R. Traum,et al.  Evaluation of Multi-party Virtual Reality Dialogue Interaction , 2004, LREC.

[26]  Ophir Frieder,et al.  Temporal analysis of a very large topically categorized Web query log , 2007, J. Assoc. Inf. Sci. Technol..

[27]  Udo Kruschwitz,et al.  Log Analysis for Adaptive Dialogue-Driven Search , 2009 .

[28]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[29]  Udo Kruschwitz An Adaptable Search System for Collections of Partially Structured Documents , 2003, IEEE Intell. Syst..

[30]  Peter G. Anick,et al.  The paraphrase search assistant: terminological feedback for iterative information seeking , 1999, SIGIR '99.

[31]  Amanda Spink,et al.  Web searcher interaction with the Dogpile.com metasearch engine , 2007 .

[32]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[33]  Tomek Strzalkowski,et al.  Cross-document summarization by concept classification , 2002, SIGIR '02.

[34]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[35]  Nina Wacholder,et al.  Designing a Realistic Evaluation of an End-to-end Interactive Question Answering System , 2004, LREC.

[36]  Laila Dybkjær,et al.  The disc approach to spoken language systems development and evaluation , 1998 .

[37]  Josep Lluís de la Rosa i Esteva,et al.  A Taxonomy of Recommender Agents on the Internet , 2003, Artificial Intelligence Review.

[38]  SpinkAmanda,et al.  Real life information retrieval: a study of user queries on the Web , 1998 .

[39]  Ralph Weischedel,et al.  Named Entity Extraction from Broadcast News , 1999 .

[40]  James Glass,et al.  The VOYAGER speech understanding system: preliminary development and evaluation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[41]  Victor Zue,et al.  GALAXY: a human-language interface to on-line travel information , 1994, ICSLP.

[42]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[43]  Dominique Laurent,et al.  QA better than IR ? , 2006 .

[44]  Hsin-Hsi Chen,et al.  Overview of the NTCIR-5 Cross-Lingual Question Answering Task (CLQA1) , 2005, NTCIR.

[45]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[46]  Olivia R. Liu Sheng,et al.  Analysis of the query logs of a Web site search engine , 2005, J. Assoc. Inf. Sci. Technol..

[47]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[48]  Gerrit Bloothooft,et al.  Evaluating various spoken dialogue systems with a single questionnaire: analysis of the elsnet olympics , 1998, LREC.

[49]  Udo Kruschwitz,et al.  Users want more sophisticated search assistants: Results of a task-based evaluation , 2005, J. Assoc. Inf. Sci. Technol..

[50]  Tomek Strzalkowski,et al.  A Data Driven Approach to Interactive QA , 2004, New Directions in Question Answering.

[51]  Tomek Strzalkowski,et al.  Data-Driven Strategies for an Automated Dialogue System , 2004, ACL.

[52]  Peter G. Anick Using terminological feedback for web search refinement: a log-based study , 2003, SIGIR.

[53]  Nina Wacholder,et al.  HITIQA: Towards Analytical Question Answering , 2004, COLING.

[54]  Amanda Spink,et al.  Web searcher interaction with the Dogpile.com metasearch engine , 2007, J. Assoc. Inf. Sci. Technol..

[55]  Karen Markey,et al.  Twenty-five years of end-user searching, Part 2: Future research directions , 2007, J. Assoc. Inf. Sci. Technol..

[56]  Amanda Spink,et al.  Web Search: Public Searching of the Web , 2011, Information Science and Knowledge Management.