Efficient global network learning from local reconstructions

Discovering complex interactions is an important issue in numerous fields ranging from social sciences to systems biology. Over the past few decades, many network learning methods have exhibited competitive results on various types of data. A commonly reached conclusion is that some learning approaches are more advisable than others depending on the dataset type or the complexity of the underlying network. Another frequently encountered issue relates to the ever increasing number of variables that need to be simultaneously dealt with, especially when only a small number of observations is available. The ScaleNet, a novel reconstruction method which can embed different types of network discovery approaches within a spectral framework for large graphical model was introduced recently. The approach identifies sets of connected variables based on the magnitude and sign of the eigenvector elements of a normalized graph Laplacian matrix, and it learns in parallel multiple relevant sub-graphs of a large network. However, the number of eigenvectors to be used and the size of the sub-graphs are to be fixed by an expensive procedure of cross-validation. In this contribution, we propose heuristics to find both optimal number of eigenvectors and the number of nodes in the sub-networks. We illustrate by the results on standard large-scale data sets and on a real human gut graph reconstruction that the proposed approaches save computational time, i.e. are efficient, and reach the state-of-the-art performance.

[1]  D. Spielman,et al.  Spectral partitioning works: planar graphs and finite element meshes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[2]  Larry A. Wasserman,et al.  The huge Package for High-dimensional Undirected Graph Estimation in R , 2012, J. Mach. Learn. Res..

[3]  Myron Wish,et al.  Three-Way Multidimensional Scaling , 1978 .

[4]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[5]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[6]  Petter Holme,et al.  Subnetwork hierarchies of biochemical pathways , 2002, Bioinform..

[7]  Pietro Perona,et al.  A Factorization Approach to Grouping , 1998, ECCV.

[8]  Luis M. de Campos,et al.  A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests , 2006, J. Mach. Learn. Res..

[9]  Charles J. Alpert,et al.  Spectral Partitioning: The More Eigenvectors, The Better , 1995, 32nd Design Automation Conference.

[10]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[11]  Anna R. Karlin,et al.  Spectral analysis of data , 2001, STOC '01.

[12]  B. Mohar THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[13]  Duncan Fyfe Gillies,et al.  An eigenvalue-problem formulation for non-parametric mutual information maximisation for linear dimensionality reduction , 2012 .

[14]  Ewart R. Carson,et al.  A Model-Based Approach to Insulin Adjustment , 1991, AIME.

[15]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[16]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[17]  Jens Roat Kultima,et al.  Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes , 2014, Nature Biotechnology.

[18]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[19]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[20]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[21]  C S Jensen,et al.  Blocking Gibbs sampling for linkage analysis in large pedigrees with many loops. , 1999, American journal of human genetics.

[22]  Oded Maimon,et al.  Evaluation of gene-expression clustering via mutual information distance measure , 2007, BMC Bioinformatics.

[23]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[24]  M. Fiedler A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory , 1975 .

[25]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[26]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[27]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[28]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[29]  Kun Huang,et al.  A unifying theorem for spectral embedding and clustering , 2003, AISTATS.

[30]  Carlo Tomasi,et al.  Image Similarity Using Mutual Information of Regions , 2004, ECCV.

[31]  Andrew B. Kahng,et al.  Spectral Partitioning with Multiple Eigenvectors , 1999, Discret. Appl. Math..

[32]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[33]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Hervé Isambert,et al.  3off2: A network reconstruction algorithm based on 2-point and 3-point information statistics , 2016, BMC Bioinformatics.

[35]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[36]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[37]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[38]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[39]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[40]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[41]  Philippe Salembier,et al.  NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference , 2015, BMC Bioinformatics.

[42]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[43]  Nataliya Sokolovska,et al.  Spectral consensus strategy for accurate reconstruction of large biological networks , 2016, BMC Bioinformatics.

[44]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[45]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .