Improving Distributed Resource Search through a Statistical Methodology of Topological Feature Selection

The Internet is considered a complex network for its size, interconnectivity and rules that govern are dynamic, because of constantly evolve. For this reason the search of distributed resources shared by users and online communities is a complex task that needs efficient search method. The goal of this work is to improve the performance of distributed search of information, through analysis of the topological features. In this paper we described a statistical methodology to select a set of topologic metrics that allow to locally distinguish the type of complex network. In this way we use the metrics to guide the search towards nodes with better connectivity. In addition we present an algorithm for distributed search of information, enriched with the selected topological metric. The results show that including the topological metric in the Neighboring-Ant Search algorithm improves its performance 50% in terms of the number of hops needed to locate a set of resources. The methodology described provides a better understanding of why the features were selected and aids to explain how this metric impacts in the search process.

[1]  Marvin V. Zelkowitz,et al.  Empirical studies to build a science of computer science , 2007, CACM.

[2]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[4]  W. Ali,et al.  Extraction of topological features from communication network topological patterns using self-organizing feature maps , 2004, ArXiv.

[5]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[6]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[7]  Julio M. Ottino,et al.  Complex systems and networks: Challenges and opportunities for chemical and biological engineers , 2004 .

[8]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[9]  Jan-Ming Ho,et al.  AntSearch: An Ant Search Algorithm in Unstructured Peer-to-Peer Networks , 2006, 11th IEEE Symposium on Computers and Communications (ISCC'06).

[10]  H. HéctorJ.Fraire,et al.  NAS Algorithm for Semantic Query Routing Systems in Complex Networks , 2008, DCAI.

[11]  Catherine C. McGeoch Experimental algorithmics , 2007, CACM.

[12]  John N. Hooker,et al.  Needed: An Empirical Science of Algorithms , 1994, Oper. Res..

[13]  Chris Wiggins,et al.  Discriminative topological features reveal biological network mechanisms , 2004, BMC Bioinformatics.

[14]  Alex Arenas,et al.  Search and Congestion in Complex Networks , 2003 .

[15]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[16]  Edoardo M. Airoldi,et al.  Sampling algorithms for pure network topologies: a study on the stability and the separability of metric embeddings , 2005, SKDD.

[17]  Hein Meling,et al.  Anthill: a framework for the development of agent-based peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[18]  Gerti Kappel,et al.  Ant Algorithms for Self-Organization in Social Networks Conducted for the purpose of receiving the academic title 'Doktorin der technischen Wissenschaften' Advisors , 2007 .

[19]  Diomidis Spinellis,et al.  A survey of peer-to-peer content distribution technologies , 2004, CSUR.

[20]  Jan-Ming Ho,et al.  AntSearch: An Ant Search Algorithm in Unstructured Peer-to-Peer Networks , 2006, ISCC.

[21]  Claudia Gómez Santillán,et al.  Impact of Dynamic Growing on the Internet Degree Distribution , 2007, ISPA Workshops.

[22]  Reka Albert,et al.  Mean-field theory for scale-free random networks , 1999 .