User-Centered Adaptive Information Retrieval

Information retrieval systems are critical for overcoming information overload. A major deficiency of existing retrieval systems is that they generally lack user modeling and are not adaptive to individual users. Personalization is expected to break this deficiency and significantly improve retrieval accuracy. In this thesis, we study how to put the user in the center of information retrieval process for the personalized search. We develop a decision-theoretic framework for optimizing interactive information retrieval based on eager user model updating. The framework emphasizes immediate and frequent feedback to bring maximum benefit of context to the user. It serves as a roadmap for studying retrieval models for personalized search. Specific retrieval models for exploiting implicit user context are developed to improve retrieval accuracy. Evaluation indicates that the user context information especially the clickthrough information can effectively and efficiently improve retrieval performance. Sometimes we need user effort to provide more information to improve the retrieval performance. In this scenario, we study how a retrieval system can brm active feedback. We frame the problem as a statistical decision problem, and examine several special cases in refining the framework. The experimental results indicate that the diversity in the presented documents is a desirable property. On the result representation side, we study how to exploit a user's clickthrough information to adaptively reorganize the clustering results. We propose four strategies for adapting clustering results based on user interactions. The simulation experiments show that the adaptation strategies have different performance for different types of users. We also conduct a user study on one adaptive clustering strategy to see if an adaptive clustering system can bring users better search utility than a static clustering system. The results show that there is generally no significant difference between the two systems from a user's perspective. We design and develop a client-side web search agent UCAIR for personalized search. UCAIR captures and exploits implicit context information to immediately rerank any documents that have not yet been seen by the user. User studies show that the UCAIR improves performance over a popular search engine, on which UCAIR search agent is built.

[1]  William I. Gasarch,et al.  A Survey on Private Information Retrieval (Column: Computational Complexity) , 2004, Bull. EATCS.

[2]  Krishna Bharat SearchPad: explicit capture of search context to support Web search , 2000, Comput. Networks.

[3]  Rafael Accorsi,et al.  Personalization in privacy-aware highly dynamic systems , 2006, CACM.

[4]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[5]  Susan T. Dumais,et al.  Optimizing search by showing results in context , 2001, CHI.

[6]  ChengXiang Zhai,et al.  A session-based search engine , 2004, SIGIR '04.

[7]  Timos Sellis,et al.  Sailing the web with captain Nemo: a personalized metasearch engine , 2005, ICML 2005.

[8]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[9]  Clare-Marie Karat,et al.  Usable privacy and security for personal information management , 2006, CACM.

[10]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[11]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[12]  W. Bruce Croft,et al.  Relevance Feedback and Personalization: A Language Modeling Perspective , 2001, DELOS.

[13]  ChengXiang Zhai,et al.  Exploiting Personal Search History to Improve Search Accuracy , 2006 .

[14]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[15]  DumaisSusan,et al.  Evaluating implicit measures to improve web search , 2005 .

[16]  Paolo Ferragina,et al.  A personalized search engine based on Web‐snippet hierarchical clustering , 2005, WWW '05.

[17]  ChengXiang Zhai,et al.  Implicit user modeling for personalized search , 2005, CIKM '05.

[18]  Dale Schuurmans,et al.  Dynamic Web log session identification with statistical language models , 2004, J. Assoc. Inf. Sci. Technol..

[19]  Yen-Jen Oyang,et al.  Query-Session-Based Term Suggestion for Interactive Web Search , 2001, WWW Posters.

[20]  Wei Zhang,et al.  An Iterative Implicit Feedback Approach to Personalized Search , 2006, ACL.

[21]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[22]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[23]  Charles L. A. Clarke,et al.  Overview of the TREC 2004 Terabyte Track | NIST , 2005 .

[24]  Claudio Carpineto,et al.  Mobile Clustering Engine , 2006, ECIR.

[25]  Masatoshi Yoshikawa,et al.  Adaptive web search based on user profile constructed without any effort from users , 2004, WWW '04.

[26]  Xiang Ji,et al.  Document clustering with prior knowledge , 2006, SIGIR.

[27]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[28]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[29]  Susan T. Dumais,et al.  Fast, Flexible Filtering with Phlat — Personal Search and Organization Made Easy , 2006 .

[30]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part I. Background and Theory , 1997, J. Documentation.

[31]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[32]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[33]  Karen Spärck Jones Search Term Relevance Weighting given Little Relevance Information , 1997, J. Documentation.

[34]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[35]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC-12: HARD track , 2003, TREC.

[36]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[37]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[38]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[39]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[40]  Charles L. A. Clarke,et al.  Overview of the TREC 2004 Terabyte Track , 2004, TREC.

[41]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[42]  Donna K. Harman,et al.  Results and Challenges in Web Search Evaluation , 1999, Comput. Networks.

[43]  Ryen W. White,et al.  Evaluating implicit feedback models using searcher simulations , 2005, TOIS.

[44]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[45]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[46]  Mark Claypool,et al.  Implicit interest indicators , 2001, IUI '01.

[47]  Nicholas J. Belkin,et al.  Detecting Document Genre for Personalization of Information Retrieval , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[48]  Ryen W. White,et al.  A study of factors affecting the utility of implicit relevance feedback , 2005, SIGIR '05.

[49]  S. Robertson The probability ranking principle in IR , 1997 .

[50]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[51]  Gordon Bell,et al.  MyLifeBits: fulfilling the Memex vision , 2002, MULTIMEDIA '02.

[52]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[53]  Daphne Koller,et al.  Active learning: theory and applications , 2001 .

[54]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[55]  Ryen W. White,et al.  A Simulated Study of Implicit Feedback Models , 2004, ECIR.

[56]  Gerard Salton,et al.  Improving Retrieval Performance by Relevance Feedback , 1997 .

[57]  Pedro M. Domingos,et al.  A machine learning approach to web personalization , 2002 .

[58]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[59]  Djoerd Hiemstra,et al.  Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002 , 2003, SIGF.

[60]  Yi Zhang,et al.  Exploration and Exploitation in Adaptive Filtering Based on Bayesian Active Learning , 2003, ICML.

[61]  Nicholas J. Belkin,et al.  Display time as implicit feedback: understanding task effects , 2004, SIGIR '04.

[62]  Nicholas J. Belkin,et al.  Does Familiarity Breed Content? Taking Account of Familiarity with a Topic in Personalizing Information Retrieval , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[63]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[64]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[65]  Kristian J. Hammond,et al.  User interactions with everyday applications as context for just-in-time information access , 2000, IUI '00.

[66]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[67]  Santosh S. Vempala,et al.  A divide-and-merge methodology for clustering , 2005, PODS '05.

[68]  Daphne Koller,et al.  Active Learning for Parameter Estimation in Bayesian Networks , 2000, NIPS.

[69]  Pattie Maes,et al.  Just-in-time information retrieval , 2000 .

[70]  Xiaojun Jenny Yuan,et al.  Supporting Multiple Information-Seeking Strategies in a Single System Framework , 2006, NAACL.

[71]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[72]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[73]  Xuehua Shen,et al.  Context-sensitive information retrieval using implicit feedback , 2005, SIGIR '05.

[74]  David Lewis,et al.  Active by Accident: Relevance Feedback in Information Retrieval , 1995 .

[75]  Geoffrey Nunberg As Google Goes, So Goes the Nation , 2003 .

[76]  Hava T. Siegelmann,et al.  Active Information Retrieval , 2001, NIPS.

[77]  Jennifer Widom,et al.  Vision Paper: Enabling Privacy for the Paranoids , 2004, VLDB.

[78]  ChengXiang Zhai,et al.  Exploiting query history for document ranking in interactive information retrieval , 2003, SIGIR '03.

[79]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[80]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[81]  Tom M. Mitchell,et al.  Text clustering with extended user feedback , 2006, SIGIR.

[82]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[83]  ChengXiang Zhai,et al.  Risk minimization and language modeling in text retrieval dissertation abstract , 2002, SIGF.

[84]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[85]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[86]  Eugene Volokh,et al.  Personalization and privacy , 2000, CACM.

[87]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[88]  Peter Ingwersen,et al.  Information retrieval in context: IRiX , 2005, SIGF.

[89]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[90]  Paul Over,et al.  The TREC 2001 Interactive Track Report , 2001, TREC.

[91]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[92]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[93]  Kamal Nigamyknigam,et al.  Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[94]  Hinrich Schütze,et al.  Personalized search , 2002, CACM.

[95]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[96]  ChengXiang Zhai,et al.  Active Feedback - UIUC TREC-2003 HARD Experiments , 2003, TREC.