Amortized Edge Sampling

We present a sublinear time algorithm that allows one to sample multiple edges from a distribution that is pointwise $\epsilon$-close to the uniform distribution, in an \emph{amortized-efficient} fashion. We consider the adjacency list query model, where access to a graph $G$ is given via degree and neighbor queries. The problem of sampling a single edge in this model has been considered by Eden and Rosenbaum (SOSA 18). Let $n$ and $m$ denote the number of vertices and edges of $G$, respectively. Eden and Rosenbaum provided upper and lower bounds of $\Theta^*(n/\sqrt m)$ for sampling a single edge in general graphs (where $O^*(\cdot)$ suppresses $\textrm{poly}(1/\epsilon)$ and $\textrm{poly}(\log n)$ dependencies). We ask whether the query complexity lower bound for sampling a single edge can be circumvented when multiple samples are required. That is, can we get an improved amortized per-sample cost if we allow a more costly preprocessing phase? We answer in the affirmative. We present an algorithm that, if one knows the number of required samples $q$ in advance, has an overall cost of $O^*(\sqrt q \cdot(n/\sqrt m))$, which is strictly preferable to $O^*(q\cdot (n/\sqrt m))$ cost resulting from $q$ invocations of the algorithm by Eden and Rosenbaum. More generally, for an input parameter $x>1$, our algorithm has a preprocessing phase with $O^*(n/(x\cdot d_{avg}))$ cost, which then allows an $O(x/\epsilon)$ per-sample cost, where $d_{avg}$ denotes the average degree of the graph.

[1]  Dana Ron,et al.  The Arboricity Captures the Complexity of Sampling Edges , 2019, ICALP.

[2]  Pan Peng,et al.  Sampling Arbitrary Subgraphs Exactly Uniformly in Sublinear Time , 2020, ICALP.

[3]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[4]  Sepehr Assadi,et al.  A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling , 2018, ITCS.

[5]  Dana Ron,et al.  Tight Bounds for Testing Bipartiteness in General Graphs , 2004, RANDOM-APPROX.

[6]  Ramana Rao Kompella,et al.  Network Sampling: From Static to Streaming Graphs , 2012, TKDD.

[7]  Dana Ron,et al.  Approximating average parameters of graphs , 2008, Random Struct. Algorithms.

[8]  Ronitt Rubinfeld,et al.  Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling , 2017, Algorithmica.

[9]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[10]  A. J. Walker New fast method for generating discrete random numbers with arbitrary frequency distributions , 1974 .

[11]  Ata Turk,et al.  Edge-Based Wedge Sampling to Estimate Triangle Counts in Very Large Graphs , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[12]  Donald F. Towsley,et al.  Estimating and sampling graphs with multidimensional random walks , 2010, IMC '10.

[13]  Uriel Feige,et al.  On sums of independent random variables with unbounded variance, and estimating the average degree in a graph , 2004, STOC '04.

[14]  Colin Cooper,et al.  Estimating network parameters using random walks , 2012, 2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN).

[15]  Alastair J. Walker,et al.  An Efficient Method for Generating Discrete Random Variables with General Distributions , 1977, TOMS.

[16]  Hossein Jowhari,et al.  Tight bounds for Lp samplers, finding duplicates in streams, and related problems , 2010, PODS.

[17]  Long Jin,et al.  Understanding Graph Sampling Algorithms for Social Network Analysis , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[18]  G. Marsaglia,et al.  Fast Generation of Discrete Random Variables , 2004 .

[19]  Uri Alon,et al.  Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs , 2004, Bioinform..

[20]  Aziz Mohaisen,et al.  Measuring the mixing time of social graphs , 2010, IMC '10.

[21]  Will Rosenbaum,et al.  On Sampling Edges Almost Uniformly , 2017, SOSA.

[22]  Will Rosenbaum,et al.  Lower Bounds for Approximating Graph Parameters via Communication Complexity , 2017, APPROX-RANDOM.

[23]  Ryan A. Rossi,et al.  On Sampling from Massive Graph Streams , 2017, Proc. VLDB Endow..