Local Access to Huge Random Objects Through Partial Sampling

Consider an algorithm performing a computation on a huge random object. Is it necessary to generate the entire object up front, or is it possible to provide query access to the object and sample it incrementally "on-the-fly"? Such an implementation should emulate the object by answering queries in a manner consistent with a random instance sampled from the true distribution. Our first set of results focus on undirected graphs with independent edge probabilities, under certain assumptions. Then, we use this to obtain the first efficient implementations for the Erdos-Renyi model and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. We also introduce a new Random-Neighbor query. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (always positive random walks on the integer line). Here, we support Height queries to find the position of the walk, and First-Return queries to find the time when the walk returns to a specified height. This in turn can be used to implement Next-Neighbor queries on random rooted/binary trees, and Matching-Bracket queries on random well bracketed expressions. Finally, we define a new model that: (1) allows multiple independent simultaneous instantiations of the same implementation to be consistent with each other without communication (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid $q$-colorings of an input graph $G$ with max degree $\Delta$. The distribution over valid colorings is specified via a "huge" underlying graph $G$, that is far too large to be read in sub-linear time. Instead, we access $G$ through local neighborhood probes. We are able to answer queries to the color of any vertex in sub-linear time for $q > 9\Delta$.

[1]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[2]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.

[3]  William W. Cohen,et al.  Community-Based Recommendations: a Solution to the Cold Start Problem , 2011 .

[4]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[5]  Mohsen Ghaffari,et al.  A Simple Parallel and Distributed Sampling Technique: Local Glauber Dynamics , 2018, DISC.

[6]  Joel C. Miller,et al.  Efficient Generation of Networks with Given Expected Degrees , 2011, WAW.

[7]  Moti Medina,et al.  Sublinear Random Access Generators for Preferential Attachment Graphs , 2017, ICALP.

[8]  Michael Luby,et al.  How to Construct Pseudo-Random Permutations from Pseudo-Random Functions (Abstract) , 1986, CRYPTO.

[9]  D. Watts,et al.  An Experimental Study of Search in Global Social Networks , 2003, Science.

[10]  Mark Newman,et al.  Models of the Small World , 2000 .

[11]  Charles U. Martel,et al.  Analyzing Kleinberg's (and other) small-world Models , 2004, PODC '04.

[12]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[13]  M. Newman,et al.  On the uniform generation of random graphs with prescribed degree sequences , 2003, cond-mat/0312028.

[14]  Noga Alon,et al.  Space-efficient local computation algorithms , 2011, SODA.

[15]  Sudipto Guha,et al.  Fast, small-space algorithms for approximate histogram maintenance , 2002, STOC '02.

[16]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[17]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[18]  Dana Ron,et al.  On Approximating the Minimum Vertex Cover in Sublinear Time and the Connection to Distributed Algorithms , 2007, Electron. Colloquium Comput. Complex..

[19]  Moni Naor,et al.  On the construction of pseudo-random permutations: Luby-Rackoff revisited (extended abstract) , 1997, STOC '97.

[20]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[22]  Ulrik Brandes,et al.  Efficient generation of large random networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Boaz Patt-Shamir,et al.  On the Probe Complexity of Local Computation Algorithms , 2018, ICALP.

[24]  Aristotelis Tsirigos,et al.  Detecting community structures in Hi-C genomic data , 2015, 2016 Annual Conference on Information Science and Systems (CISS).

[25]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[27]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[28]  Yishay Mansour,et al.  Converting Online Algorithms to Local Computation Algorithms , 2012, ICALP.

[29]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[30]  Maleq Khan,et al.  Parallel Algorithms for Generating Random Networks with Given Degree Sequences , 2014, International Journal of Parallel Programming.

[31]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[32]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[33]  Martin E. Dyer,et al.  Path coupling: A technique for proving rapid mixing in Markov chains , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[34]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[35]  Ronitt Rubinfeld,et al.  Fast Local Computation Algorithms , 2011, ICS.

[36]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[37]  Stéphane Bressan,et al.  Fast random graph generation , 2011, EDBT/ICDT '11.

[38]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Yufan Zheng,et al.  The Complexity of (Δ+1) Coloring in Congested Clique, Massively Parallel Computation, and Centralized Local Computation , 2018, PODC.

[40]  Eric Vigoda,et al.  A survey on the use of Markov chains to randomly sample colorings , 2006 .

[41]  Oded Goldreich,et al.  On the Implementation of Huge Random Objects , 2003, SIAM J. Comput..

[42]  Dana Ron,et al.  Property Testing in Bounded Degree Graphs , 1997, STOC.

[43]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[44]  Silvio Micali,et al.  How to construct random functions , 1986, JACM.

[45]  Jingchun Chen,et al.  Detecting functional modules in the yeast protein-protein interaction network , 2006, Bioinform..

[46]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[47]  Moni Naor,et al.  Implementing Huge Sparse Random Graphs , 2007, APPROX-RANDOM.