Effects of Sample Size and Dimensionality on the Performance of Four Algorithms for Inference of Association Networks in Metabonomics.

We investigated the effect of sample size and dimensionality on the performance of four algorithms (ARACNE, CLR, CORR, and PCLRC) when they are used for the inference of metabolite association networks. We report that as many as 100-400 samples may be necessary to obtain stable network estimations, depending on the algorithm and the number of measured metabolites. The CLR and PCLRC methods produce similar results, whereas network inference based on correlations provides sparse networks; we found ARACNE to be unsuitable for this application, being unable to recover the underlying metabolite association network. We recommend the PCLRC algorithm for the inference on metabolite association networks.

[1]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[2]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[3]  Walter Willinger,et al.  Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications , 2005, Internet Math..

[4]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[5]  B. McManus,et al.  The Human Serum Metabolome , 2011, PloS one.

[6]  A. Smilde,et al.  Correlated measurement error hampers association network inference. , 2014, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[7]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[8]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[9]  O. Hoekenga,et al.  Weighted Correlation Network Analysis (WGCNA) Applied to the Tomato Fruit Metabolome , 2011, PloS one.

[10]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[11]  Gustavo Stolovitzky,et al.  Lessons from the DREAM2 Challenges , 2009, Annals of the New York Academy of Sciences.

[12]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[13]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[14]  Alessandro Vespignani,et al.  Detecting rich-club ordering in complex networks , 2006, physics/0602134.

[15]  Diana M. Hendrickx Network inference from time-resolved metabolomics data , 2013 .

[16]  Xia Yang,et al.  Systems Biology Approaches and Applications in Obesity, Diabetes, and Cardiovascular Diseases , 2012, Current Cardiovascular Risk Reports.

[17]  Jesse C. J. van Dam,et al.  Integration of heterogeneous molecular networks to unravel gene-regulation in Mycobacterium tuberculosis , 2014, BMC Systems Biology.

[18]  A Niño,et al.  Quantitative modeling of degree-degree correlation in complex networks. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Age K. Smilde,et al.  Covariances Simultaneous Component Analysis: a new method within a framework for modeling covariances , 2015 .

[20]  Yury Tikunov,et al.  A correlation network approach to metabolic data analysis for tomato fruits , 2008, Euphytica.

[21]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[22]  M. Johannesson,et al.  The Swedish Twin Registry: Establishment of a Biobank and Other Recent Developments , 2012, Twin Research and Human Genetics.

[23]  Brian Cloteaux,et al.  Measuring the effectiveness of the s-metric to produce better network models , 2008, 2008 Winter Simulation Conference.

[24]  Adam N. Letchford,et al.  Binary Positive Semidefinite Matrices and Associated Integer Polytopes , 2008, IPCO.

[25]  P. Robert,et al.  A Unifying Tool for Linear Multivariate Statistical Methods: The RV‐Coefficient , 1976 .

[26]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[27]  Dan Braha,et al.  The Topology of Large-Scale Engineering Problem-Solving Networks , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[29]  Frank Emmert-Streib,et al.  Pathway Analysis of Expression Data: Deciphering Functional Building Blocks of Complex Diseases , 2011, PLoS Comput. Biol..

[30]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[31]  Age K. Smilde,et al.  Real-life metabolomics data analysis : how to deal with complex data ? , 2010 .

[32]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[33]  Diogo F. Veiga,et al.  Network inference and network response identification: moving genome-scale data to the next level of biological discovery. , 2010, Molecular bioSystems.

[34]  Christoph Steinbeck,et al.  MetaboLights: An Open‐Access Database Repository for Metabolomics Data , 2016, Current protocols in bioinformatics.

[35]  O. de Weck,et al.  Overview of metrics and their correlation patterns for multiple-metric topology analysis on heterogeneous graph ensembles. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  David S. Wishart,et al.  MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data , 2010, Nucleic Acids Res..

[37]  L. Tenori,et al.  Probabilistic networks of blood metabolites in healthy subjects as indicators of latent cardiovascular risk. , 2015, Journal of proteome research.

[38]  Kathleen A Stringer,et al.  The emerging field of quantitative blood metabolomics for biomarker discovery in critical illnesses. , 2011, American journal of respiratory and critical care medicine.

[39]  R. Guimerà,et al.  Classes of complex networks defined by role-to-role connectivity profiles. , 2007, Nature physics.

[40]  Steven C. Lawlor,et al.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data , 2003, Genome Biology.

[41]  Shi Zhou,et al.  The rich-club phenomenon in the Internet topology , 2003, IEEE Communications Letters.

[42]  P. Mendes,et al.  The origin of correlations in metabolomics data , 2005, Metabolomics.

[43]  Age K. Smilde,et al.  Metabolic network discovery through reverse engineering of metabolome data , 2009, Metabolomics.

[44]  Large-scale non-targeted metabolomic profiling in three human population-based studies , 2014, bioRxiv.

[45]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .