Strong Ties vs. Weak Ties: Studying the Clustering Paradox for Decentralized Search

We studied decentralized search in information networks and focused on the impact of network clustering on the findability of relevant information sources. We developed a multiagent system to simulate peer-to-peer networks, in which peers worked with one another to forward queries to targets containing relevant information, and evaluated the effectiveness, efficiency, and scalability of the decentralized search. Experiments on a network of 181 peers showed that the RefNet method based on topical similarity cues outperformed random walks and was able to reach relevant peers through short search paths. When the network was extended to a larger community of 5890 peers, however, the advantage of the RefNet model was constrained due to noise of many topically irrelevant connections or weak ties. By applying topical clustering and a clustering exponent � to guide network rewiring, we studied the role of strong ties vs. weak ties, particularly their influence on distributed search. Interestingly, an inflection point was discovered for �, below which performance suffered from many remote connections that disoriented searches and above which performance degraded due to lack of weak ties that could move queries quickly from one segment to another. The inflection threshold for the 5890-peer network was � ≈ 3.5. Further experiments on larger networks of up to 4 million peers demonstrated that clustering optimization is crucial for decentralized search. Although overclustering only moderately degraded search performance on small networks, it led to dramatic loss in search efficiency for large networks. We explain the implication on scalability of distributed systems that rely on clustering for search.

[1]  Gurmeet Singh Manku,et al.  SETS: search enhanced by topic segmentation , 2003, SIGIR.

[2]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[3]  Jie Lu,et al.  User modeling for full-text federated search in peer-to-peer networks , 2006, SIGIR '06.

[4]  Gudrun Fischer,et al.  Towards scatter/gather browsing in a hierarchical peer-to-peer network , 2005, P2PIR '05.

[5]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[6]  Marián Boguñá,et al.  Navigability of Complex Networks , 2007, ArXiv.

[7]  Munindar P. Singh,et al.  Community-based service location , 2001, CACM.

[8]  Weimao Ke,et al.  Collaborative classifier agents: studying the impact of learning in distributed document classification , 2007, JCDL '07.

[9]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[10]  Jon M. Kleinberg,et al.  Navigation in a small world , 2000, Nature.

[11]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[12]  D. Zeinalipour-Yazti,et al.  Information retrieval techniques for peer-to-peer networks , 2004, Computing in Science & Engineering.

[13]  Karl Aberer,et al.  The CIKM 2005 workshop on information retrieval in peer-to-peer networks , 2006, SIGF.

[14]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[15]  Munindar P. Singh,et al.  Searching social networks , 2003, AAMAS '03.

[16]  Mark S. Ackerman,et al.  Searching for expertise in social networks: a simulation of potential strategies , 2005, GROUP.

[17]  Euripides G. M. Petrakis,et al.  A measure for cluster cohesion in semantic overlay networks , 2008, LSDS-IR '08.

[18]  Karl Aberer,et al.  ALVIS peers: a scalable full-text peer-to-peer retrieval engine , 2006, P2PIR '06.

[19]  Hector Garcia-Molina,et al.  Semantic Overlay Networks for P2P Systems , 2004, AP2PC.

[20]  Victor R. Lesser,et al.  A reinforcement learning based distributed search algorithm for hierarchical peer-to-peer information retrieval systems , 2007, AAMAS '07.

[21]  Karl Aberer,et al.  Web text retrieval with a P2P query-driven index , 2007, SIGIR.

[22]  Ivana Podnar Žarko The CIKM 2006 Workshop on Information Retrieval in Peer-to-Peer Networks , 2007 .

[23]  M E J Newman,et al.  Identity and Search in Social Networks , 2002, Science.

[24]  Gerhard Weikum,et al.  Improving collection selection with overlap awareness in P2P search engines , 2005, SIGIR '05.

[25]  Jon Crowcroft,et al.  A survey and comparison of peer-to-peer overlay network schemes , 2005, IEEE Communications Surveys & Tutorials.

[26]  Weimao Ke,et al.  Dynamicity vs. effectiveness: studying online clustering for scatter/gather , 2009, SIGIR.

[27]  Fabio Bellifemine,et al.  Developing Multi-Agent Systems with JADE (Wiley Series in Agent Technology) , 2007 .

[28]  K. Sparck Jones,et al.  A TEST FOR THE SEPARATION OF RELEVANT AND NON‐RELEVANT DOCUMENTS IN EXPERIMENTAL RETRIEVAL COLLECTIONS , 1973 .

[29]  David D. Jensen,et al.  Navigating networks by using homophily and degree , 2008, Proceedings of the National Academy of Sciences.

[30]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Jasmine Novak,et al.  Geographic routing in social networks , 2005, Proc. Natl. Acad. Sci. USA.

[32]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[33]  Christos Doulkeridis,et al.  Peer-to-peer similarity search over widely distributed document collections , 2008, LSDS-IR '08.

[34]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.