Using Memex to archive and mine community Web browsing experience

Abstract Keyword indices, topic directories, and link-based rankings are used to search and structure the rapidly growing Web today. Surprisingly little use is made of years of browsing experience of millions of people. Indeed, this information is routinely discarded by browsers. Even deliberate bookmarks are stored passively, in browser-dependent formats; this separates them from the dominant world of HTML hypermedia, even if their owners were willing to share them. All this goes against Vannevar Bush's dream of the Memex : an enhanced supplement to personal and community memory. We present the beginnings of a Memex for the Web. Memex blurs the artificial distinction between browsing history and deliberate bookmarks. The resulting glut of data is analyzed in a number of ways. It is indexed not only by keywords but also according to the user's view of topics ; this lets the user recall topic-based browsing contexts by asking questions like `What trails was I following when I was last surfing about classical music ?' and `What are some popular pages related to my recent trail regarding cycling ?' Memex is a browser assistant that performs these functions. We envisage that Memex will be shared by a community of surfers with overlapping interests; in that context, the meaning and ramifications of topical trails may be decided by not one but many surfers. We present a novel formulation of the community taxonomy synthesis problem , algorithms, and experimental results. We also recommend uniform APIs which will help managing advanced interactions with the browser.

[1]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[2]  Kristina Höök,et al.  Workshop on personalized and social navigation in information space , 1998 .

[3]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[4]  Soumen Chakrabarti,et al.  Surfing the Web Backwards , 1999, Comput. Networks.

[5]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[6]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[7]  Israel Ben-Shaul,et al.  Adding Support for Dynamic and Focused Search with Fetuccino , 1999, Comput. Networks.

[8]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[9]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  Israel Ben-Shaul,et al.  Automatically Organizing Bookmarks per Contents , 1996, Comput. Networks.

[12]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[13]  Carolyn McCreary,et al.  Directed Graphs by Clan-Based Decomposition , 1995, GD.

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[16]  Prabhakar Raghavan,et al.  Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies , 1998, The VLDB Journal.

[17]  Yoelle Maarek,et al.  The Shark-Search Algorithm. An Application: Tailored Web Site Mapping , 1998, Comput. Networks.

[18]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[19]  Krishna Bharat,et al.  Supporting cooperative and personal surfing with a desktop assistant , 1997, UIST '97.

[20]  Divyakant Agrawal,et al.  PowerBookmarks: A System for Personalizable Web Information Organization, Sharing, and Management , 1999, Comput. Networks.

[21]  Paul P. Maglio,et al.  Metaphors We Surf the Web By , 2022 .

[22]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[23]  Paul P. Maglio,et al.  Intermediaries: New Places for Producing and Manipulating Web Content , 1998, Comput. Networks.

[24]  Sholom M. Weiss,et al.  Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[25]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[26]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[27]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[28]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.