Evaluating Sampling Techniques for Large Dynamic Graphs

In this section, we examine how the connectivity structure of the graph affects the efficiency of the RDS and MRW sampling techniques. Toward this end, we simulate both RDS and MRW techniques over several types of synthetic graph structures as well as one real graph. Note that we do not consider any of the synthetic graphs to be appropriate models of actual P2P systems or OSNs, but we use them in our experiments as examples for which we know the ground truth. At the same time, our choice of the models described below is not entirely arbitrary, but is motivated by the type of heterogeneities that we expect many of the real-world P2P networks and OSNs to exhibit. Graph Models: We consider the following four synthetic graph models that range from very homogeneous to highly heterogeneous. This in turn allows us to explore the heterogeneity of a graph along two dimensions, namely node degree and clustering. We also use a real graph that is a full snapshot of the Gnutella ultrapeer overlay taken on 5/5/2008 at 3pm. (i) Random graphs (ER): The well-known class of ErdosRenyi random graphs [2], the simplest variety of random graphs where links between node pairs are inserted with probability p, independent of anything else. (ii) Small-world (SW): We consider here the ”smallworld” model proposed by Watts and Strogatz [11] who considered a one-parametric class of networks which interpolates between a regular ring lattice and a random graph without altering the number of nodes and links. This process generates graph structures with high clustering and small path lengths. SW (p) has a single parameter p and increasing p reduces the degree of clustering in the graph. (iii) Barabasi and Albert (BA): Many real-world connectivity structures are heterogeneous in the sense that their nodes degrees or vertex connectivities exhibit high variability or, more specifically, follow a power-law distribution. To account for this type of heterogeneity, Barabasi and Albert [1] proposed the class of scale-free models of the preferential attachment type, whereby graphs grow by addition of new nodes and links and where a newly arriving nodes connects with higher probability to an already highly connected node in the existing graph. Growing a graph in accordance to this preferential attachment mechanism can be shown to generate graph structures whose node degree distributions follow a power-law or are scale-free. (iv) Hierarchical Scale-Free (HSF): In addition to the heterogeneity captured by highly variable node degrees, many real-world graphs also show heterogeneity in the sense of exhibiting clusters-within-clusters structure. A construction of a class of simple toy models of graphs that have power-law node degree distributions, significant clustering, and a pronounced hierarchical structure was given in [1]. In this article, Barabasi et al. provide an iterative model for generating scale-free graphs with a hierarchical structure that we call HSF in this paper. An example is shown in Firgure 1 The graph HSF (n,m) has n nodes. The HSF model takes two parameters: n and m : n denotes the size of the fully connected cell which is the building block of the graph, and m denotes the number of interations used to produce the graph. As shown in Figure 1, the construction proceeds by generating n cells of size n and connecting them in a certain way. Using the generated structure as a new cell, we repeat this process m times to obtain an HSF graph with m well-defined levels that manifest themselves as clearly visible clusterswithin-clusters. This HSF graph can be shown to have a power-law node degree distribution and a high clustering coefficient, independent of the size of the network. As shown in Figure 4(e), both node degree and clustering have power-law distributions in HSF graphs. Also, it is shown that the node clustering is inversely proportional to node degree. In this class of graphs, node clustering is inversely proportional to node degree. This property provides a truely scale-free clustered graph as the average clustering coefficient does not change with the graph’s order. [1] (v) Gnutella (GA): Snapshots of the Gnutella ultrapeer topology, captured in our earlier work [9]. Figure 1(a) presents the KS error for degree distribution from samples collected by the RDS technique