Distances and Stability in Biological Network Theory

In this thesis we introduce, define and quantitatively assess the stability of the algorithms for the econstruction of networks. We will focus on theory, development and implementation of operative procedures and algorithms for the assessment of stability in complex networks for biological systems, with gene regulatory networks as the key example. A major issue affecting network inference is indeed the high variability of network reconstruction and network topology inferred after data perturbation, different parameter choices and alternative methods. Network stability will thus be used to measure reliability of inferred topology, also obtaining confidence intervals for the outcomes. The methods will be employed to introduce a new approach to reproducibility in the study of complex networks. It will also be coupled with statistical machine learning models, in order to integrate feature selection and network inference within a pathway profiling approach. The evaluation of similarity between networks will be the first and central operative procedure of the developed pipelines, the key point being the identification of distances that can compare network structures improving over classical measures based on the confusion matrix, too coarse for this task. A combination of spectral and edit distances especially tailored for biological networks will be investigated and applied to several high-throughput biological datasets of different nature and with different tasks in oncogenomics, neurogenomics and exposomics.

[1]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[2]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[3]  Anirban Banerjee,et al.  Structural distance and evolutionary relationship of networks , 2008, Biosyst..

[4]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Peter Langfelder,et al.  Eigengene networks for studying the relationships between co-expression modules , 2007, BMC Systems Biology.

[7]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[8]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[9]  M. Tyers,et al.  Osprey: a network visualization system , 2003, Genome Biology.

[10]  Francesc Comellas,et al.  Spectral reconstruction of complex networks , 2008 .

[11]  T. Ideker,et al.  Differential network biology , 2012, Molecular systems biology.

[12]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[13]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[14]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[15]  Willem H. Haemers,et al.  Enumeration of cospectral graphs , 2004, Eur. J. Comb..

[16]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[17]  M. Moschovi,et al.  Expression of Epidermal Growth Factor Receptor and HER-2 in Pediatric Embryonal Brain Tumors , 2010, Pediatric Neurosurgery.

[18]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[19]  Albert-László Barabási,et al.  Scale-free networks , 2008, Scholarpedia.

[20]  Xiao-Jiang Feng,et al.  Identifying Biological Network Structure, Predicting Network Behavior, and Classifying Network State With High Dimensional Model Representation (HDMR) , 2012, PloS one.

[21]  Gary D Bader,et al.  Systematic Genetic Analysis with Ordered Arrays of Yeast Deletion Mutants , 2001, Science.

[22]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[23]  M. Buchanan,et al.  Networks in cell biology , 2010 .

[24]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[25]  Julia Kastner,et al.  Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[26]  Cesare Furlanello,et al.  A glocal distance for network comparison , 2012, ArXiv.

[27]  Claudio Cobelli,et al.  A Gene Network Simulator to Assess Reverse Engineering Algorithms , 2009, Annals of the New York Academy of Sciences.

[28]  André Boorsma,et al.  Genomic analysis suggests higher susceptibility of children to air pollution. , 2008, Carcinogenesis.

[29]  Winnie S. Liang,et al.  Neuronal gene expression in non-demented individuals with intermediate Alzheimer's Disease neuropathology , 2010, Neurobiology of Aging.

[30]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[31]  M. Gerstein,et al.  Global analysis of protein phosphorylation in yeast , 2005, Nature.

[32]  Lorenzo Rosasco,et al.  A method for robust variable selection with significance assessment , 2008, ESANN.

[33]  Ralf Tönjes,et al.  Perturbation analysis of complete synchronization in networks of phase oscillators. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Finding correlations in big data , 2012, Nature Biotechnology.

[35]  E. Eichler,et al.  Regional patterns of gene expression in human and chimpanzee brains. , 2004, Genome research.

[36]  Igor Rivin,et al.  Extremal metrics on graphs I , 2000 .

[37]  Masaru Tomita,et al.  Indeterminacy of Reverse Engineering of Gene Regulatory Networks: The Curse of Gene Elasticity , 2007, PloS one.

[38]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[39]  Shilpa Chakravartula,et al.  Complex Networks: Structure and Dynamics , 2014 .

[40]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[41]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[42]  Cesare Furlanello,et al.  Algebraic Comparison of Partial Lists in Bioinformatics , 2010, PloS one.

[43]  Alexander S Mikhailov,et al.  Evolutionary reconstruction of networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[44]  Gustavo Stolovitzky,et al.  Lessons from the DREAM2 Challenges , 2009, Annals of the New York Academy of Sciences.

[45]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[46]  Feng Q. He,et al.  Reverse engineering and verification of gene networks: principles, assumptions, and limitations of present methods and future perspectives. , 2009, Journal of biotechnology.

[47]  A. Califano,et al.  Dialogue on Reverse‐Engineering Assessment and Methods , 2007, Annals of the New York Academy of Sciences.

[48]  Piet Van Mieghem,et al.  Graph Spectra for Complex Networks , 2010 .

[49]  Robert Tibshirani,et al.  An Introduction to the Bootstrap CHAPMAN & HALL/CRC , 1993 .

[50]  A. Barabasi,et al.  The network takeover , 2011, Nature Physics.

[51]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[52]  Kathleen A. Boyle,et al.  Amyloid-beta peptide binds with heme to form a peroxidase: relationship to the cytopathologies of Alzheimer's disease. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[54]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[55]  G. Ruvkun,et al.  A uniform system for microRNA annotation. , 2003, RNA.

[56]  Sergey N. Dorogovtsev,et al.  Critical phenomena in complex networks , 2007, ArXiv.

[57]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[58]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[59]  Julio Saez-Rodriguez,et al.  Crowdsourcing Network Inference: The DREAM Predictive Signaling Network Challenge , 2011, Science Signaling.

[60]  S. Sorbi,et al.  SNPs in neurotrophin system genes and Alzheimer's disease in an Italian population. , 2008, Journal of Alzheimer's disease : JAD.

[61]  C J Oates,et al.  Network Inference and Biological Dynamics. , 2011, The annals of applied statistics.

[62]  Albert-László Barabási,et al.  The Architecture of Biological Networks , 2006 .

[63]  Le Song,et al.  KELLER: estimating time-varying interactions between genes , 2009, Bioinform..

[64]  Ralf Herwig,et al.  IntScore: a web tool for confidence scoring of biological interactions , 2012, Nucleic Acids Res..

[65]  Nathalie Wong,et al.  Emerging roles of microRNA in the intracellular signaling networks of hepatocellular carcinoma , 2011, Journal of gastroenterology and hepatology.

[66]  E. Schadt,et al.  Genetic and Genomic Analysis of a Fat Mass Trait with Complex Inheritance Reveals Marked Sex Specificity , 2006, PLoS genetics.

[67]  J. Kaye,et al.  An aberrant protein complex in CSF as a biomarker of Alzheimer disease , 2008, Neurology.

[68]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[69]  Ramón Díaz-Uriarte,et al.  PaLS: filtering common literature, biological terms and pathway information , 2008, Nucleic Acids Res..

[70]  Michael J E Sternberg,et al.  The identification of similarities between biological networks: application to the metabolome and interactome. , 2007, Journal of molecular biology.

[71]  M. Peitsch,et al.  Verification of systems biology research in the age of collaborative competition , 2011, Nature Biotechnology.

[72]  Alessandro Giuliani,et al.  Metabolic pathways variability and sequence/networks comparisons , 2006, BMC Bioinformatics.

[73]  Andrew E. Teschendorff,et al.  DART: Denoising Algorithm based on Relevance network Topology improves molecular pathway activity inference , 2011, BMC Bioinformatics.

[74]  Anirban Banerjee,et al.  Spectral plots and the representation and interpretation of biological data , 2007, Theory in Biosciences.

[75]  Y. Lazebnik Can a biologist fix a radio? — or, what I learned while studying apoptosis , 2004, Biochemistry (Moscow).

[76]  Y. Pekarsky,et al.  Reprogramming of miRNA networks in cancer and leukemia. , 2010, Genome research.

[77]  Janaka N. Edirisinghe,et al.  Vitamin K2 Is a Mitochondrial Electron Carrier That Rescues Pink1 Deficiency , 2012, Science.

[78]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[79]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[80]  Giuseppe Jurman,et al.  RegnANN: Reverse Engineering Gene Networks Using Artificial Neural Networks , 2010, PloS one.

[81]  Benjamin E Dunmore,et al.  Gene network inference and visualization tools for biologists: application to new human transcriptome datasets , 2011, Nucleic acids research.

[82]  Anirban Banerjee,et al.  Graph spectra as a systematic tool in computational biology , 2007, Discret. Appl. Math..

[83]  Melissa J. Davis,et al.  Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets , 2012, Genome Medicine.

[84]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[85]  David Liben-Nowell,et al.  An algorithmic approach to social networks , 2005 .

[86]  Edward R. Dougherty,et al.  Validation of gene regulatory networks: scientific and inferential , 2011, Briefings Bioinform..

[87]  S. Shen-Orr,et al.  Network motifs in the transcriptional regulation network of Escherichia coli , 2002, Nature Genetics.

[88]  Cesare Furlanello,et al.  An introduction to spectral distances in networks , 2010, WIRN.

[89]  Julio R. Banga,et al.  Inference of complex biological networks: distinguishability issues and optimization-based solutions , 2011, BMC Systems Biology.

[90]  Krista A. Zanetti,et al.  Identification of metastasis‐related microRNAs in hepatocellular carcinoma , 2008, Hepatology.

[91]  B. Pincombe Detecting changes in time series of network graphs using minimum mean squared error and cumulative summation , 2007 .

[92]  Bing Zhang,et al.  WebGestalt: an integrated system for exploring gene sets in various biological contexts , 2005, Nucleic Acids Res..

[93]  Lin Song,et al.  Comparison of co-expression measures: mutual information, correlation, and model based indices , 2012, BMC Bioinformatics.

[94]  Winnie S. Liang,et al.  Alzheimer's disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons , 2008, Proceedings of the National Academy of Sciences.

[95]  S. Horvath,et al.  Weighted gene coexpression network analysis strategies applied to mouse weight , 2007, Mammalian Genome.

[96]  M. Gerstein,et al.  Getting connected: analysis and principles of biological networks. , 2007, Genes & development.

[97]  Eric E Schadt,et al.  Cycle Regulation in Islets with Diabetes Susceptibility a Gene Expression Network Model of Type 2 Diabetes Links Cell P

, 2008 .

[98]  Sebastien Bacle EXTREMAL METRICS ON GRAPHS AND MANIFOLDS , 2005 .

[99]  Cesare Furlanello,et al.  A Machine Learning Pipeline for Discriminant Pathways Identification , 2011, CIBB.

[100]  Horst Bunke,et al.  On a relation between graph edit distance and maximum common subgraph , 1997, Pattern Recognit. Lett..

[101]  Almerima Jamakovic,et al.  A weighted spectrum metric for comparison of internet topologies , 2010, PERV.

[102]  Min Chen,et al.  Comparing Statistical Methods for Constructing Large Scale Gene Networks , 2012, PloS one.

[103]  D. Brown,et al.  ANTAGONIST DISCRIMINATION BETWEEN GANGLIONIC AND ILEAL MUSCARINIC RECEPTORS , 1997, British journal of pharmacology.

[104]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[105]  Aimee L Jackson,et al.  Coordinated regulation of cell cycle transcripts by p53-Inducible microRNAs, miR-192 and miR-215. , 2008, Cancer research.

[106]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[107]  Ujjwal Maulik,et al.  Development of the human cancer microRNA network , 2010 .

[108]  Ilya Nemenman,et al.  Reconstruction of Metabolic Networks from High‐Throughput Metabolite Profiling Data , 2007, Annals of the New York Academy of Sciences.

[109]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[110]  Alessandro Verri,et al.  A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data , 2008, J. Comput. Biol..

[111]  Angel Garrido,et al.  Symmetry in Complex Networks , 2011, Symmetry.

[112]  S. Strogatz Exploring complex networks , 2001, Nature.

[113]  Jeffrey H. Kordower,et al.  Increased Intestinal Permeability Correlates with Sigmoid Mucosa alpha-Synuclein Staining and Endotoxin Exposure Markers in Early Parkinson's Disease , 2011, PloS one.

[114]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[115]  Alberto de la Fuente,et al.  Inferring Gene Networks: Dream or Nightmare? , 2009, Annals of the New York Academy of Sciences.

[116]  Wei Wang,et al.  On the asymptotic behavior of graphs determined by their generalized spectra , 2010, Discret. Math..

[117]  S. Horvath,et al.  Conservation and evolution of gene coexpression networks in human and chimpanzee brains , 2006, Proceedings of the National Academy of Sciences.

[118]  G. J. Rodgers,et al.  INSTITUTE OF PHYSICS PUBLISHING JOURNAL OF PHYSICS A: MATHEMATICAL AND GENERAL J. Phys. A: Math. Gen. 38 (2005) 9431–9437 doi:10.1088/0305-4470/38/43/003 Eigenvalue spectra of complex networks , 2005 .

[119]  Anirban Banerjee,et al.  Spectral plot properties: Towards a qualitative classification of networks , 2008, Networks Heterog. Media.

[120]  V. Lacroix,et al.  An Introduction to Metabolic Networks and Their Structural Analysis , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[121]  Bethany S. Dohleman Exploratory social network analysis with Pajek , 2006 .

[122]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[123]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[124]  Jun Dong,et al.  Geometric Interpretation of Gene Coexpression Network Analysis , 2008, PLoS Comput. Biol..

[125]  Stephanie Roessler,et al.  MicroRNA expression, survival, and response to interferon in liver cancer. , 2009, The New England journal of medicine.

[126]  Peter Langfelder,et al.  Fast R Functions for Robust Correlations and Hierarchical Clustering. , 2012, Journal of statistical software.

[127]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[128]  Benjamin A. Logsdon,et al.  Gene Expression Network Reconstruction by Convex Feature Selection when Incorporating Genetic Perturbations , 2010, PLoS Comput. Biol..

[129]  W. Haemers,et al.  Which graphs are determined by their spectrum , 2003 .

[130]  T. Speed A Correlation for the 21st Century , 2011, Science.

[131]  Frank Emmert-Streib,et al.  Revealing differences in gene network inference algorithms on the network level by ensemble methods , 2010, Bioinform..

[132]  A. Barabasi,et al.  Systems biology and the future of medicine , 2011, Wiley interdisciplinary reviews. Systems biology and medicine.

[133]  F. Atay,et al.  Network synchronization: Spectral versus statistical properties , 2006, 0706.3069.

[134]  S. Horvath,et al.  Evidence for anti-Burkitt tumour globulins in Burkitt tumour patients and healthy individuals. , 1967, British Journal of Cancer.

[135]  O. Shanker,et al.  DEFINING DIMENSION OF A COMPLEX NETWORK , 2007 .

[136]  Benoit Macq,et al.  Statistical Applications in Genetics and Molecular Biology Transcriptional Network Inference from Functional Similarity and Expression Data : A Global Supervised Approach , 2012 .

[137]  Ping Zhu,et al.  A Study of Graph Spectra for Comparing Graphs , 2005, BMVC.

[138]  O. Shanker,et al.  Graph zeta function and dimension of complex network , 2007 .

[139]  Rui Luo,et al.  Is My Network Module Preserved and Reproducible? , 2011, PLoS Comput. Biol..

[140]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[141]  Paul Pavlidis,et al.  The role of indirect connections in gene networks in predicting function , 2011, Bioinform..

[142]  S. Horvath Weighted Network Analysis: Applications in Genomics and Systems Biology , 2011 .

[143]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[144]  Jin Wang,et al.  Gene regulation is governed by a core network in hepatocellular carcinoma , 2012, BMC Systems Biology.

[145]  Wei Wang,et al.  A sufficient condition for a family of graphs being determined by their generalized spectra , 2006, Eur. J. Comb..

[146]  F. Chibon,et al.  Cancer gene expression signatures - the rise and fall? , 2013, European journal of cancer.

[147]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .