Summarizing local context to personalize global web search

The PC Desktop is a very rich repository of personal information, efficiently capturing user's interests. In this paper we propose a new approach towards an automatic personalization of web search in which the user specific information is extracted from such local desktops, thus allowing for an increased quality of user profiling, while sharing less private information with the search engine. More specifically, we investigate the opportunities to select personalized query expansion terms for web search using three different desktop oriented approaches: summarizing the entire desktop data, summarizing only the desktop documents relevant to each user query, and applying natural language processing techniques to extract dispersive lexical compounds from relevant desktop resources. Our experiments with the Google API showed at least the latter two techniques to produce a very strong improvement over current web search.

[1]  Pattie Maes,et al.  Just-in-time information retrieval agents , 2000, IBM Syst. J..

[2]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[3]  Wolfgang Nejdl,et al.  Pushing task relevant web links down to the desktop , 2006, WIDM '06.

[4]  K. Sparck Jones,et al.  A Probabilistic Model of Information Retrieval : Development and Status , 1998 .

[5]  Susan T. Dumais,et al.  Implicit queries (IQ) for contextualized search , 2004, SIGIR '04.

[6]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[7]  Slava M. Katz Distribution of content words and phrases in text and language modelling , 1996, Natural Language Engineering.

[8]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[9]  Daniel E. Rose,et al.  Content awareness in a file system interface: implementing the “pile” metaphor for organizing information , 1993, SIGIR.

[10]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[11]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[12]  Wolfgang Nejdl,et al.  Using ODP metadata to personalize search , 2005, SIGIR '05.

[13]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[14]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[15]  GauchSusan,et al.  Ontology-based personalized search and browsing , 2003 .

[16]  Eugene Volokh,et al.  Personalization and privacy , 2000, CACM.

[17]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[18]  Masatoshi Yoshikawa,et al.  Adaptive web search based on user profile constructed without any effort from users , 2004, WWW '04.

[19]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[20]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[21]  Clement T. Yu,et al.  Personalized Web search for improving retrieval effectiveness , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  Peter G. Anick,et al.  The paraphrase search assistant: terminological feedback for iterative information seeking , 1999, SIGIR '99.

[23]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[24]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[25]  Gareth J. F. Jones,et al.  Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[26]  Yuji Matsumoto,et al.  A new approach to unsupervised text summarization , 2001, SIGIR '01.

[27]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[28]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[29]  Arnold L. Rosenberg,et al.  Finding topic words for hierarchical summarization , 2001, SIGIR '01.

[30]  W. Bruce Croft,et al.  Generating hierarchical summaries for web searches , 2003, SIGIR '03.

[31]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[32]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[33]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[34]  Alexander Pretschner,et al.  Ontology-based personalized search and browsing , 2003, Web Intell. Agent Syst..

[35]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[36]  Wolfgang Nejdl,et al.  PROS: A Personalized Ranking Platform for Web Search , 2004, AH.

[37]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[38]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[39]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[40]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[41]  Kristian J. Hammond,et al.  Watson: Anticipating and Contextualizing Information Needs , 1999 .

[42]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.