B-hist: Entity-centric search over personal web browsing history

Web Search is increasingly entity centric; as a large fraction of common queries target specific entities, search results get progressively augmented with semi-structured and multimedia information about those entities. However, search over personal web browsing history still revolves around keyword-search mostly. In this paper, we present a novel approach to answer queries over web browsing logs that takes into account entities appearing in the web pages, user activities, as well as temporal information. Our system, B-hist, aims at providing web users with an effective tool for searching and accessing information they previously looked up on the web by supporting multiple ways of filtering results using clustering and entity-centric search. In the following, we present our system and motivate our User Interface (UI) design choices by detailing the results of a survey on web browsing and history search. In addition, we present an empirical evaluation of our entity-based approach used to cluster web pages.

[1]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[2]  Ravi Kumar,et al.  A characterization of online browsing behavior , 2010, WWW '10.

[3]  Andy Cockburn,et al.  An empirical analysis of web page revisitation , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[4]  Andy Cockburn,et al.  What do web users do? An empirical analysis of web use , 2001, Int. J. Hum. Comput. Stud..

[5]  Fabrizio Silvestri,et al.  Discovering tasks from search engine query logs , 2013, TOIS.

[6]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[7]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[8]  Gianluca Demartini,et al.  Combining inverted indices and structured search for ad-hoc object retrieval , 2012, SIGIR '12.

[9]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[10]  Benjamin B. Bederson,et al.  Browsing Icons: A Task-Based Approach for a Visual Web History , 2003 .

[11]  Eelco Herder,et al.  Web page revisitation revisited: implications of a long-term click-stream study of browser usage , 2007, CHI.

[12]  Fabrizio Silvestri,et al.  Mining Query Logs: Turning Search Usage Data into Knowledge , 2010, Found. Trends Inf. Retr..

[13]  Susan T. Dumais,et al.  Keeping and re-finding information on the web: What do people do and what do they need? , 2005, ASIST.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Robin Jeffries,et al.  CHI '06 Extended Abstracts on Human Factors in Computing Systems , 2006, CHI 2006.

[16]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[17]  Jaime Teevan,et al.  Large scale query log analysis of re-finding , 2010, WSDM '10.

[18]  Peter Fankhauser,et al.  Boilerplate detection using shallow text features , 2010, WSDM '10.

[19]  Yasuhiro Yamamoto,et al.  A history-centric approach for enhancing web browsing experiences , 2006, CHI EA '06.

[20]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[21]  Fabrizio Silvestri,et al.  Identifying task-based sessions in search engine query logs , 2011, WSDM '11.

[22]  Karl Aberer,et al.  TRank: Ranking Entity Types Using the Web of Data , 2013, International Semantic Web Conference.

[23]  Peter Mika,et al.  Microsearch: An Interface for Semantic Search , 2008, SemSearch.