Machine learning for efficient neighbor selection in unstructured P2P networks

Self-reorganization offers the promise of improved performance, scalability and resilience in Peer-to-Peer (P2P) overlays. In practice however, the benefit of reorganization is often lower than the cost incurred in probing and adapting the overlay. We employ machine learning feature selection in a novel manner: to reduce communication cost thereby providing the basis of an efficient neighbor selection scheme for P2P overlays. In addition, our method enables nodes to locate and attach to peers that are likely to answer future queries with no a priori knowledge of the queries. We evaluate our neighbor classifier against live data from the Gnutella unstructured P2P network. We find Support Vector Machines with forward fitting predict suitable neighbors for future queries with over 90% accuracy while requiring minimal (<2% of the features) knowledge of the peer's files or type. By providing a means to effectively and efficiently select neighbors in a self-reorganizing overlay, this work serves as a step forward in bringing such architectures to real-world fruition.

[1]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[3]  Robert E. Beverly,et al.  Reorganization in network regions for optimality and fairness , 2004 .

[4]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[5]  Daniel Stutzbach,et al.  On the Long-term Evolution of the Two-Tier Gnutella Overlay , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[6]  Ramayya Krishnan,et al.  Intelligent Club Management in Peer-to-Peer Networks , 2003 .

[7]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[8]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[9]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[10]  Ben Y. Zhao,et al.  Impact of Neighbor Selection on Performance and Resilience of Structured P2P Networks , 2005, IPTPS.

[11]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[12]  Krishna P. Gummadi,et al.  The impact of DHT routing geometry on resilience and proximity , 2003, SIGCOMM '03.

[13]  Panos Kalnis,et al.  Real Datasets for File-Sharing Peer-to-Peer Systems , 2005, DASFAA.

[14]  Bobby Bhattacharjee,et al.  Are Virtualized Overlay Networks Too Much of a Good Thing? , 2002, IPTPS.

[15]  Daniel Stutzbach,et al.  Characterizing unstructured overlay topologies in modern P2P file-sharing systems , 2008, TNET.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Scott Shenker,et al.  Making gnutella-like P2P systems scalable , 2003, SIGCOMM '03.