Peer-to-Peer Clustering of Web-browsing Users

For most users, Web-based centralized search engines are the access point to distributed resources such as Web pages, items shared in file sharing-systems, etc. Unfortunately, existing search engines compute their results on the basis of structural information only, e.g., the Web graph structure or query-document similarity estimations. Users expectations are rarely considered to enhance the subjective relevance of returned results. However, exploiting such information can help search engines satisfy users by tailoring search results. Interestingly, user interests typically follow the clustering property: users who were interested in the same topics in the past are likely to be interested in these same topics also in the future. It follows that search results considered relevant by a user belonging to a group of homogeneous users will likely also be of interest to other users from the same group. In this paper, we propose the architecture of a novel peerto-peer system exploiting collaboratively built search mechanisms. The paper discusses the challenges associated with a system based on the interest clustering principle. The objective is to provide a self-organized network of users, grouped according to the interests they share, that can be leveraged to enhance the quality of the experience perceived by users searching the Web.

[1]  Emin Gün Sirer,et al.  Beehive: O(1) Lookup Performance for Power-Law Query Distributions in Peer-to-Peer Overlays , 2004, NSDI.

[2]  Ian T. Foster,et al.  Small-world file-sharing communities , 2003, IEEE INFOCOM 2004.

[3]  Emin Gün Sirer,et al.  Client behavior and feed characteristics of RSS, a publish-subscribe system for web micronews , 2005, IMC '05.

[4]  David R. Karger,et al.  Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems , 2004, IPTPS.

[5]  R. Akavipat,et al.  Emerging semantic communities in peer web search , 2006, P2PIR '06.

[6]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[7]  Weixun Li Semantic Peer-to-Peer Overlays for Publish / Subscribe Networks , 2007 .

[8]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM '04.

[9]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[10]  Anne-Marie Kermarrec,et al.  Rappel: Exploiting interest and network locality to improve fairness in publish-subscribe systems , 2009, Comput. Networks.

[11]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[12]  Miguel Castro,et al.  SplitStream: high-bandwidth multicast in cooperative environments , 2003, SOSP '03.

[13]  Jon M. Kleinberg,et al.  Navigation in a small world , 2000, Nature.

[14]  Anne-Marie Kermarrec,et al.  Gossip-based peer sampling , 2007, TOCS.

[15]  Márk Jelasity,et al.  T-Man: Gossip-Based Overlay Topology Management , 2005, Engineering Self-Organising Systems.

[16]  Anne-Marie Kermarrec,et al.  Peer sharing behaviour in the eDonkey network, and implications for the design of server-less file sharing systems , 2006, EuroSys.

[17]  Hasan Davulcu,et al.  Term Ranking for Clustering Web Search Results , 2007, WebDB.

[18]  ChengXiang Zhai,et al.  Mining long-term search history to improve search accuracy , 2006, KDD '06.

[19]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[20]  Tao Wu,et al.  Efficient mobile content delivery by exploiting user interest correlation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[21]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[22]  Paolo Ferragina,et al.  A personalized search engine based on Web‐snippet hierarchical clustering , 2008, Softw. Pract. Exp..

[23]  Santosh S. Vempala,et al.  A divide-and-merge methodology for clustering , 2005, PODS '05.

[24]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[25]  Lada A. Adamic,et al.  Zipf's law and the Internet , 2002, Glottometrics.

[26]  Grant Schoenebeck,et al.  CHORA: Expert-Based P2P Web Search , 2006, AP2PC.

[27]  Matthieu Latapy,et al.  Combining the Use of Clustering and Scale-Free Nature of User Exchanges into a Simple and Efficient P2P System , 2005, Euro-Par.

[28]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.