Found Graph Data and Planted Vertex Covers

A typical way in which network data is recorded is to measure all the interactions among a specified set of core nodes; this produces a graph containing this core together with a potentially larger set of fringe nodes that have links to the core. Interactions between pairs of nodes in the fringe, however, are not recorded by this process, and hence not present in the resulting graph data. For example, a phone service provider may only have records of calls in which at least one of the participants is a customer; this can include calls between a customer and a non-customer, but not between pairs of non-customers. Knowledge of which nodes belong to the core is an important piece of metadata that is crucial for interpreting the network dataset. But in many cases, this metadata is not available, either because it has been lost due to difficulties in data provenance, or because the network consists of found data obtained in settings such as counter-surveillance. This leads to a natural algorithmic problem, namely the recovery of the core set. Since the core set forms a vertex cover of the graph, we essentially have a planted vertex cover problem, but with an arbitrary underlying graph. We develop a theoretical framework for analyzing this planted vertex cover problem, based on results in the theory of fixed-parameter tractability, together with algorithms for recovering the core. Our algorithms are fast, simple to implement, and out-perform several methods based on network core-periphery structure on various real-world datasets.

[1]  Ronald L. Breiger,et al.  Structures of Economic Interdependence among Nations , 1982 .

[2]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[3]  Martin G. Everett,et al.  Models of core/periphery structures , 2000, Soc. Networks.

[4]  Cristopher Moore,et al.  Independent Sets in Random Graphs from the Weighted Second Moment Method , 2010, APPROX-RANDOM.

[5]  Sang Hoon Lee,et al.  Detection of core–periphery structure in networks using spectral methods and geodesic paths , 2014, European Journal of Applied Mathematics.

[6]  Ian T. Foster,et al.  Mapping the Gnutella Network , 2002, IEEE Internet Comput..

[7]  Ratul Mahajan,et al.  Measuring ISP topologies with rocketfuel , 2002, TNET.

[8]  Judy Goldsmith,et al.  Nondeterminism Within P , 1993, SIAM J. Comput..

[9]  Alan M. Frieze,et al.  On the independence number of random graphs , 1990, Discret. Math..

[10]  P. Holme Core-periphery organization of complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Matthew Andrews,et al.  Spectral analysis of communication networks using Dirichlet eigenvalues , 2011, WWW.

[12]  Mason A. Porter,et al.  Core-Periphery Structure in Networks (Revisited) , 2017, SIAM Rev..

[13]  Peter Damaschke,et al.  The union of minimal hitting sets: Parameterized combinatorial bounds and counting , 2009, J. Discrete Algorithms.

[14]  David A. Bader,et al.  Approximating Betweenness Centrality , 2007, WAW.

[15]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[16]  Andrea Montanari,et al.  Finding Hidden Cliques of Size $$\sqrt{N/e}$$N/e in Nearly Linear Time , 2013, Found. Comput. Math..

[17]  Daniel A. Spielman Erdös-Rényi Random Graphs : Warm Up , 2010 .

[18]  Kathleen M. Carley,et al.  Patterns and dynamics of users' behavior and interaction: Network analysis of an online community , 2009, J. Assoc. Inf. Sci. Technol..

[19]  Jure Leskovec,et al.  Local Higher-Order Graph Clustering , 2017, KDD.

[20]  Sang Hoon Lee,et al.  Density-Based and Transport-Based Core-Periphery Structures in Networks , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Douglas W. Oard,et al.  An Exploratory Study of the W3C Mailing List Test Collection for Retrieval of Emails with Pro/Con Argument , 2006, CEAS.

[22]  Jure Leskovec,et al.  The Network Completion Problem: Inferring Missing Nodes and Edges in Networks , 2011, SDM.

[23]  Danai Koutra,et al.  DELTACON: A Principled Massive-Graph Similarity Function , 2013, SDM.

[24]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[25]  Ulrik Brandes,et al.  On variants of shortest-path betweenness centrality and their generic computation , 2008, Soc. Networks.

[26]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[27]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.

[28]  David Eppstein,et al.  Listing All Maximal Cliques in Large Sparse Real-World Graphs , 2011, JEAL.

[29]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[30]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[31]  A. Comrey The Minimum Residual Method of Factor Analysis , 1962 .

[32]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[33]  David J. Phillips,et al.  Surveillance and Empowerment , 2010 .

[34]  U. Feige,et al.  Finding hidden cliques in linear time , 2009 .

[35]  Jimmy J. Lin,et al.  TREC 2006 at Maryland: Blog, Enterprise, Legal and QA Tracks , 2006, TREC.

[36]  Karl Rohe,et al.  Novel sampling design for respondent-driven sampling , 2016 .

[37]  Wang Chiew Tan,et al.  Research Problems in Data Provenance , 2004, IEEE Data Eng. Bull..

[38]  Ling-Yun Wu,et al.  Structure and dynamics of core/periphery networks , 2013, J. Complex Networks.

[39]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[40]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[41]  Ronald S. Burt,et al.  Positions in Networks , 1976 .

[42]  Peter Damaschke,et al.  Parameterized enumeration, transversals, and imperfect phylogeny reconstruction , 2004, Theor. Comput. Sci..

[43]  Terry Kuny The digital dark ages? Challenges in the preservation of electronic information , 1998 .

[44]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[45]  Peter Sanders,et al.  Better Approximation of Betweenness Centrality , 2008, ALENEX.

[46]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[47]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[48]  Sean P. Hier,et al.  Surveillance: Power, Problems, and Politics , 2009 .

[49]  C. Lynch Big data: How do your data grow? , 2008, Nature.

[50]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[51]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[52]  Jignesh M. Patel,et al.  Big data and its technical challenges , 2014, CACM.

[53]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[54]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[55]  Mark S Handcock,et al.  7. Respondent-Driven Sampling: An Assessment of Current Methodology , 2009, Sociological methodology.

[56]  Tamara G. Kolda,et al.  Fast Triangle Counting through Wedge Sampling , 2012, ArXiv.

[57]  Gueorgi Kossinets Effects of missing data in social networks , 2006, Soc. Networks.

[58]  Emmanuel Abbe,et al.  Recovering Communities in the General Stochastic Block Model Without Knowing the Parameters , 2015, NIPS.

[59]  Jon M. Kleinberg,et al.  Social Networks Under Stress , 2016, WWW.

[60]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[61]  Emmanuel Abbe,et al.  Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation , 2016, NIPS.

[62]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[63]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[64]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[65]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[66]  Avi Wigderson,et al.  Sum-of-squares Lower Bounds for Planted Clique , 2015, STOC.

[67]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[68]  Xiao Zhang,et al.  Identification of core-periphery structure in networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.