Consistent Recovery Threshold of Hidden Nearest Neighbor Graphs

Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. The special case of Bernoulli distributions corresponds to a variant of the Watts-Strogatz small-world graph. We focus on two types of asymptotic recovery guarantees as $n\to \infty$: (1) exact recovery: all edges are classified correctly with probability tending to one; (2) almost exact recovery: the expected number of misclassified edges is $o(nk)$. We show that the maximum likelihood estimator achieves (1) exact recovery for $2 \le k \le n^{o(1)}$ if $ \liminf \frac{2\alpha_n}{\log n}>1$; (2) almost exact recovery for $ 1 \le k \le o\left( \frac{\log n}{\log \log n} \right)$ if $\liminf \frac{kD(P_n||Q_n)}{\log n}>1$, where $\alpha_n \triangleq -2 \log \int \sqrt{d P_n d Q_n}$ is the R\'enyi divergence of order $\frac{1}{2}$ and $D(P_n||Q_n)$ is the Kullback-Leibler divergence. Under mild distributional assumptions, these conditions are shown to be information-theoretically necessary for any algorithm to succeed. A key challenge in the analysis is the enumeration of $2k$-NN graphs that differ from the hidden one by a given number of edges.

[1]  A-L Barabási,et al.  Structure and tie strengths in mobile communication networks , 2006, Proceedings of the National Academy of Sciences.

[2]  Dirk Oliver Theis,et al.  Odd minimum cut sets and b-matchings revisited , 2006, SIAM J. Discret. Math..

[3]  Carla Simone,et al.  Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work Companion , 2012, CSCW 2012.

[4]  D. Fell,et al.  The small world inside large metabolic networks , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[5]  Jon M. Kleinberg,et al.  Detecting Strong Ties Using Network Motifs , 2017, WWW.

[6]  Eric Gilbert,et al.  Predicting tie strength with social media , 2009, CHI.

[7]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[9]  R. Otter The Number of Trees , 1948 .

[10]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[11]  Danielle Smith Bassett,et al.  Small-World Brain Networks , 2006, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[12]  Bruce Hajek,et al.  Information limits for recovering a hidden community , 2015, 2016 IEEE International Symposium on Information Theory (ISIT).

[13]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[14]  M. Newman,et al.  Epidemics and percolation in small-world networks. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[15]  Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work , 2012 .

[16]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[17]  Tengyuan Liang,et al.  On Detection and Structural Reconstruction of Small-World Random Networks , 2016, IEEE Transactions on Network Science and Engineering.

[18]  P. V. Marsden,et al.  Measuring Tie Strength , 1984 .

[19]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[20]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[21]  David Tse,et al.  Hidden Hamiltonian Cycle Recovery via Linear Programming , 2018, Oper. Res..

[22]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[23]  L. Ward Social Forces , 1911, The Psychological Clinic.

[24]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[25]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[26]  Isaac Y. Ho,et al.  Meraculous: De Novo Genome Assembly with Short Paired-End Reads , 2011, PloS one.

[27]  J. Kleinberg,et al.  Networks, Crowds, and Markets , 2010 .

[28]  E. Hill Journal of Theoretical Biology , 1961, Nature.

[29]  W. Browder,et al.  Annals of Mathematics , 1889 .

[30]  Partha Dasgupta,et al.  Topology of the conceptual network of language. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[32]  Eric Gilbert,et al.  Predicting tie strength in a new medium , 2012, CSCW.

[33]  Jari Saramäki,et al.  Modelling development of epidemics with dynamic small-world networks. , 2005, Journal of theoretical biology.

[34]  Eugene Agichtein,et al.  Proceedings of the 26th International Conference on World Wide Web Companion , 2017, WWW 2017.

[35]  Erik Millán,et al.  Crowds , 2019, Dissident Rabbi.

[36]  Michel Balinski,et al.  Integer Programming: Methods, Uses, Computations , 1965 .

[37]  Brendan L. O’Connell,et al.  Chromosome-scale shotgun assembly using an in vitro method for long-range linkage , 2015, Genome research.

[38]  M. Newman,et al.  Scaling and percolation in the small-world network model. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.