Exploiting semantic proximities for content search over p2p networks

In this paper, we address the issue of content search over peer-to peer networks. We use the concept of semantic proximity that exploits the commonalities of interests exhibited among peer users so as to decompose the network into semantic clusters. We initially define search entropy, as a metric indicating the average number of packets required to locate the requested content. Then, spectral clustering is used to organize the peer nodes into semantic clusters so that (a) the probability that a node locates content within its own cluster is maximized, while simultaneously; (b) the respective probability of finding this content outside this cluster is minimized. The proposed semantic partitioning algorithm is then extended into a hierarchical two-tier scheme, in which practical issues arising for the deployment of a peer-to-peer (p2p) application can be more easily addressed. After the system has been initialized, a dynamic algorithm places new users that join the p2p network into appropriately selected clusters and also handles peer departures without the need for matrix eigen decomposition process which is necessary for the assessment of the initial static partitioning. Our experimental results validate that (a) our static partitioning outperforms traditional and novel search techniques and (b) our dynamic algorithm is able to efficiently track the system's progression maintaining the search entropy close to the initially assessed levels.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[3]  Diomidis Spinellis,et al.  A survey of peer-to-peer content distribution technologies , 2004, CSUR.

[4]  Hector Garcia-Molina,et al.  SIL: A model for analyzing scalable peer-to-peer search networks , 2006, Comput. Networks.

[5]  Alhussein A. Abouzeid,et al.  Optimizing random walk search algorithms in P2P networks , 2007, Comput. Networks.

[6]  Yongxiang Dou,et al.  Ontology-Based Semantic Information Retrieval Systems in Unstructured P2P Networks , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.

[7]  D. Burago,et al.  A Course in Metric Geometry , 2001 .

[8]  Alon Y. Halevy,et al.  Piazza: data management infrastructure for semantic web applications , 2003, WWW '03.

[9]  Gurmeet Singh Manku,et al.  SETS: search enhanced by topic segmentation , 2003, SIGIR.

[10]  K. Veselic,et al.  Wielandt and Ky-Fan theorem for matrix pairs , 2003 .

[11]  Edith Cohen,et al.  Associative search in peer to peer networks: harnessing latent semantics , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[12]  Wolfgang Nejdl,et al.  Information Integration in Schema-Based Peer-To-Peer Networks , 2003, CAiSE.

[13]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[14]  Tim Moors,et al.  Survey of Research towards Robust Peer-to-Peer Networks: Search Methods , 2007, RFC.

[15]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  D. Burago,et al.  Gluing copies of a 3-dimensional polyhedron to obtain a closed nonpositively curved pseudomanifold , 2001 .

[18]  Yiming Hu,et al.  Efficient, proximity-aware load balancing for structured P2P systems , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).

[19]  Norihiro Ishikawa,et al.  Semantic content search in P2P networks based on RDF schema , 2003, 2003 IEEE Pacific Rim Conference on Communications Computers and Signal Processing (PACRIM 2003) (Cat. No.03CH37490).

[20]  Rongmei Zhang,et al.  Assisted Peer-to-Peer Search with Partial Indexing , 2007, IEEE Trans. Parallel Distributed Syst..

[21]  Tobias Hoßfeld,et al.  Comparison of Robust Cooperation Strategies for P2P Content Distribution Networks with Multiple Source Download , 2006, Sixth IEEE International Conference on Peer-to-Peer Computing (P2P'06).

[22]  Anne-Marie Kermarrec,et al.  Exploiting semantic proximity in peer-to-peer content searching , 2004, Proceedings. 10th IEEE International Workshop on Future Trends of Distributed Computing Systems, 2004. FTDCS 2004..

[23]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[25]  Alfred Kobsa,et al.  An LDAP-based User Modeling Server and its Evaluation , 2006, User Modeling and User-Adapted Interaction.

[26]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[27]  Li Xiao,et al.  Location-aware topology matching in P2P systems , 2004, IEEE INFOCOM 2004.

[28]  Tyson Condie,et al.  Simulating A File-Sharing P2P Network , 2003 .

[29]  Donald F. Towsley,et al.  Modeling peer-peer file sharing systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[30]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[31]  Calton Pu,et al.  Constructing a proximity-aware power law overlay network , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[32]  Scott Shenker,et al.  Making gnutella-like P2P systems scalable , 2003, SIGCOMM '03.

[33]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[34]  Daniel Stutzbach,et al.  Characterizing files in the modern Gnutella network , 2006, Electronic Imaging.

[35]  Dennis Heimbigner,et al.  Expressive and Efficient Peer-to-Peer Queries , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[36]  Karl Aberer,et al.  The chatty web: emergent semantics through gossiping , 2003, WWW '03.

[37]  Ka Boon Ng,et al.  Peer Clustering and Firework Query Model , 2002 .

[38]  Wolfgang Nejdl,et al.  Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks , 2003, WWW '03.

[39]  Carey L. Williamson,et al.  A Longitudinal Study of P2P Traffic Classification , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[40]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[42]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[43]  Mudhakar Srivatsa,et al.  Large Scaling Unstructured Peer-to-Peer Networks with Heterogeneity-Aware Topology and Routing , 2006, IEEE Transactions on Parallel and Distributed Systems.

[44]  Felix Naumann,et al.  Semantic Overlay Clusters within Super-Peer Networks , 2003, DBISP2P.

[45]  Tao Gu,et al.  Information retrieval in schema-based P2P systems using one-dimensional semantic space , 2007, Comput. Networks.