Constructing and analyzing biological interaction networks for knowledge discovery

Many biological datasets can be effectively modeled as interaction networks where nodes represent biological entities of interest such as proteins, genes, or complexes and edges mimic associations among them. The study of these biological network structures can provide insight into many biological questions including the functional characterization of genes and gene products, the characterization of DNA-protein bindings, and the understanding of regulatory mechanisms. Therefore, the task of constructing biological interaction networks from raw data sets and exploiting information from these networks is critical, but is also fraught with challenges. First, the network structure is not always known in a priori; the structure should be inferred from raw and heterogeneous biological data sources. Second, biological networks are noisy (containing unreliable interactions) and incomplete (missing real interactions) which makes the task of extracting useful information difficult. Third, typically these networks have non-trivial topological properties (e.g., uneven degree distribution, small world) that limit the effectiveness of traditional knowledge discovery algorithms. Fourth, these networks are usually dynamic and investigation of their dynamics is essential to understand the underlying biological system. In this thesis, we address these issues by presenting a set of computational techniques that we developed to construct and analyze three specific types of biological interaction networks: protein-protein interaction networks, gene co-expression networks, and regulatory networks.

[1]  Srinivasan Parthasarathy,et al.  Predicting functionality of protein–DNA interactions by integrating diverse evidence , 2009, Bioinform..

[2]  Srinivasan Parthasarathy,et al.  Mutual Information Based Extrinsic Similarity for Microarray Analysis , 2009, BICoB.

[3]  Roberto Avogadri,et al.  Fuzzy ensemble clustering based on random projections for DNA microarray data analysis , 2009, Artif. Intell. Medicine.

[4]  F. van Roy,et al.  Low nucleosome occupancy is encoded around functional human transcription factor binding sites , 2008, BMC Genomics.

[5]  Clara Pizzuti,et al.  Multi-functional Protein Clustering in PPI Networks , 2008, BIRD.

[6]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[7]  Limsoon Wong,et al.  Using Indirect protein-protein Interactions for protein Complex Prediction , 2008, J. Bioinform. Comput. Biol..

[8]  R. Sharan,et al.  Protein networks in disease. , 2008, Genome research.

[9]  W. Huh,et al.  High-resolution analysis of condition-specific regulatory modules in Saccharomyces cerevisiae , 2008, Genome Biology.

[10]  Clara Pizzuti,et al.  PINCoC : A Co-clustering Based Approach to Analyze Protein-Protein Interaction Networks , 2007, IDEAL.

[11]  Masayuki Murata,et al.  Toward bio-inspired network robustness - Step 1. Modularity , 2007, 2007 2nd Bio-Inspired Models of Network, Information and Computing Systems.

[12]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ronald W. Davis,et al.  A high-resolution atlas of nucleosome occupancy in yeast , 2007, Nature Genetics.

[14]  Srinivasan Parthasarathy,et al.  Construction of a reference gene association network from multiple profiling data: application to data analysis , 2007, Bioinform..

[15]  Srinivasan Parthasarathy,et al.  An event-based framework for characterizing the evolutionary behavior of interaction graphs , 2007, KDD '07.

[16]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[17]  M. Stumpf,et al.  Evolution at the system level: the natural history of protein interaction networks. , 2007, Trends in ecology & evolution.

[18]  Arend Hintze,et al.  Evolution of Complex Modular Biological Networks , 2007, PLoS Comput. Biol..

[19]  Alexander J. Hartemink,et al.  Nucleosome Occupancy Information Improves de novo Motif Discovery , 2007, RECOMB.

[20]  Mong-Li Lee,et al.  Labeling network motifs in protein interactomes for protein function prediction , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Andreas Beyer,et al.  Posttranscriptional Expression Regulation: What Determines Translation Rates? , 2007, PLoS Comput. Biol..

[22]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[23]  Andy M. Yip,et al.  Gene network interconnectedness and the generalized topological overlap measure , 2007, BMC Bioinformatics.

[24]  I. Simon,et al.  Reconstructing dynamic regulatory maps , 2007, Molecular systems biology.

[25]  Inke Näthke,et al.  Cytoskeleton out of the cupboard: colon cancer and cytoskeletal changes induced by loss of APC , 2006, Nature Reviews Cancer.

[26]  Caroline C. Friedel,et al.  Inferring topology from clustering coefficients in protein-protein interaction networks , 2006, BMC Bioinformatics.

[27]  Uri Alon,et al.  Coding limits on the number of transcription factors , 2006, BMC Genomics.

[28]  Srinivasan Parthasarathy,et al.  Improving Functional Modularity in Protein-Protein Interactions Graphs Using Hub-Induced Subgraphs , 2006, PKDD.

[29]  Mong-Li Lee,et al.  NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs , 2006, KDD '06.

[30]  Mong-Li Lee,et al.  Increasing confidence of protein interactomes using network topological metrics , 2006, Bioinform..

[31]  T. Ideker,et al.  Supporting Online Material for A Systems Approach to Mapping DNA Damage Response Pathways , 2006 .

[32]  Raya Khanin,et al.  How Scale-Free Are Biological Networks , 2006, J. Comput. Biol..

[33]  Trey Ideker,et al.  Integrated Assessment and Prediction of Transcription Factor Binding , 2006, PLoS Comput. Biol..

[34]  J. Winderickx,et al.  Inferring transcriptional modules from ChIP-chip, motif and microarray data , 2006, Genome Biology.

[35]  George Karypis,et al.  Multilevel algorithms for partitioning power-law graphs , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[36]  Hao Xiong,et al.  Non-linear tests for identifying differentially expressed genes or genetic networks , 2006, Bioinform..

[37]  Xiaohui Liu,et al.  Exploiting the full power of temporal gene expression profiling through a new statistical test: Application to the analysis of muscular dystrophy data , 2006, BMC Bioinformatics.

[38]  D. Litchfield,et al.  The shape of things to come: an emerging role for protein kinase CK2 in the regulation of cell morphology and the cytoskeleton. , 2006, Cellular signalling.

[39]  Z. Weng,et al.  A Global Map of p53 Transcription-Factor Binding Sites in the Human Genome , 2006, Cell.

[40]  K. S. Deshpande,et al.  Human protein reference database—2006 update , 2005, Nucleic Acids Res..

[41]  Srinivasan Parthasarathy,et al.  Effective pre-processing strategies for functional clustering of a protein-protein interactions network , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[42]  Angel Rubio,et al.  Correlation between gene expression and GO semantic similarity , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[44]  A. Hinnebusch Translational regulation of GCN4 and the general amino acid control of yeast. , 2005, Annual review of microbiology.

[45]  Inderjit S. Dhillon,et al.  A fast kernel-based multilevel algorithm for graph clustering , 2005, KDD '05.

[46]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[47]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[48]  Peng Xiao,et al.  Hotelling’s T 2 multivariate profiling for detecting differential expression in microarrays , 2005 .

[49]  R. Parker,et al.  Endoplasmic Reticulum Stress Links Dyslipidemia to Inhibition of Proteasome Activity and Glucose Transport by HIV Protease Inhibitors , 2005, Molecular Pharmacology.

[50]  B. Paschal,et al.  Mechanisms of Receptor‐Mediated Nuclear Import and Nuclear Export , 2005, Traffic.

[51]  P. Bork,et al.  Dynamic Complex Formation During the Yeast Cell Cycle , 2005, Science.

[52]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[53]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[54]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[55]  E. Koonin,et al.  Conservation and coevolution in the scale-free human gene coexpression network. , 2004, Molecular biology and evolution.

[56]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[57]  Michael Griffin,et al.  Gene co-expression network topology provides a framework for molecular characterization of cellular state , 2004, Bioinform..

[58]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[59]  Massimo Marchiori,et al.  Error and attacktolerance of complex network s , 2004 .

[60]  Jiawei Han,et al.  Mining scale-free networks using geodesic clustering , 2004, KDD.

[61]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[62]  W. Szpankowski,et al.  Biclustering gene-feature matrices for statistically significant dense patterns , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[63]  Julien Gagneur,et al.  Modular decomposition of protein-protein interaction networks , 2004, Genome Biology.

[64]  Alain Guénoche,et al.  Clustering proteins from interaction networks for the prediction of cellular functions , 2004, BMC Bioinformatics.

[65]  Gang Liu,et al.  Effects of cigarette smoke on the human airway epithelial cell transcriptome. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Anna Goldenberg,et al.  Tractable learning of large Bayes net structures from sparse data , 2004, ICML.

[67]  S. Wuchty Evolution and topology in the yeast protein interaction network. , 2004, Genome research.

[68]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[69]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[70]  L. D. Costa Hub-Based Community Finding , 2004, cond-mat/0405022.

[71]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[72]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[73]  Feng Gao,et al.  Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data , 2004, BMC Bioinformatics.

[74]  B. Snel,et al.  The yeast coexpression network has a small‐world, scale‐free architecture and can be explained by a simple model , 2004, EMBO reports.

[75]  Atul J. Butte,et al.  Quantifying the relationship between co-expression, co-regulation and gene function , 2004, BMC Bioinformatics.

[76]  M. Kanehisa,et al.  Two complementary representations of a scale-free network , 2004, physics/0402072.

[77]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[78]  M. Rexach,et al.  Molecular Basis for the Rapid Dissociation of Nuclear Localization Signals from Karyopherin α in the Nucleoplasm* , 2003, Journal of Biological Chemistry.

[79]  B. Mark Evers,et al.  Induction of cIAP-2 in Human Colon Cancer Cells through PKCδ/NF-κB* , 2003, Journal of Biological Chemistry.

[80]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[81]  L. Calza,et al.  Dyslipidaemia associated with antiretroviral therapy in HIV-infected patients. , 2003, The Journal of antimicrobial chemotherapy.

[82]  A. Bosio,et al.  Gene expression profiling in respiratory tissues from rats exposed to mainstream cigarette smoke. , 2003, Carcinogenesis.

[83]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[84]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[85]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[86]  C. Cannings,et al.  On the structure of proten-protein interaction networks , 2003 .

[87]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[88]  M. Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[89]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[90]  Christos Faloutsos,et al.  Electricity Based External Similarity of Categorical Attributes , 2003, PAKDD.

[91]  A. Heguy,et al.  Variability of antioxidant-related gene expression in the airway epithelium of cigarette smokers. , 2003, American journal of respiratory cell and molecular biology.

[92]  Yoshihide Hayashizaki,et al.  Construction of reliable protein-protein interaction networks with a new interaction generality measure , 2003, Bioinform..

[93]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[94]  Tommi S. Jaakkola,et al.  Physical network models and multi-source data integration , 2003, RECOMB '03.

[95]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[96]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[97]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[98]  R. Russell,et al.  Potential artefacts in protein‐interaction networks , 2002, FEBS letters.

[99]  Minghua Deng,et al.  Inferring Domain–Domain Interactions From Protein–Protein Interactions , 2002 .

[100]  Fan Chung Graham,et al.  Duplication Models for Biological Networks , 2002, J. Comput. Biol..

[101]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[102]  A. Wagner,et al.  Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications , 2002, BMC Evolutionary Biology.

[103]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[104]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[105]  Lani F. Wu,et al.  Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters , 2002, Nature Genetics.

[106]  V. Latora,et al.  Efficiency of scale-free networks: error and attack tolerance , 2002, cond-mat/0205601.

[107]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[108]  B. Snel,et al.  The identification of functional modules from the genomic association of genes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[109]  Margaret Werner-Washburne,et al.  The genomics of yeast responses to environmental stress and starvation , 2002, Functional & Integrative Genomics.

[110]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[111]  Tommi S. Jaakkola,et al.  Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models , 2001, Pacific Symposium on Biocomputing.

[112]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[113]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[114]  A. Vespignani,et al.  Modeling of Protein Interaction Networks , 2001, Complexus.

[115]  V. Eguíluz,et al.  Growing scale-free networks with small-world behavior. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[116]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[117]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[118]  P. Grant,et al.  A tale of histone modifications , 2001, Genome Biology.

[119]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[120]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[121]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[122]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[123]  F. Estruch Stress-controlled transcription factors, stress-induced genes and stress tolerance in budding yeast. , 2000, FEMS microbiology reviews.

[124]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[125]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[126]  S. Henikoff,et al.  Identification of in vivo DNA targets of chromatin proteins using tethered Dam methyltransferase , 2000, Nature Biotechnology.

[127]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[128]  G. Stormo,et al.  ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[129]  J D Beggs,et al.  Characterization of Sm‐like proteins in yeast and their association with U6 snRNA , 1999, The EMBO journal.

[130]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[131]  G. Karypis,et al.  Multilevel k-way hypergraph partitioning , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[132]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[133]  Heikki Mannila,et al.  Similarity of Attributes by External Probes , 1998, KDD.

[134]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[135]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[136]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[137]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[138]  M. Carlson,et al.  Two homologous zinc finger genes identified by multicopy suppression in a SNF1 protein kinase mutant of Saccharomyces cerevisiae , 1993, Molecular and cellular biology.

[139]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[140]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[141]  Linton C. Freeman,et al.  Centered graphs and the structure of ego networks , 1982, Math. Soc. Sci..

[142]  Robert Schrek,et al.  Statistics in Research. Basic Concepts and Techniques for Research Workers , 1955 .

[143]  Hakan Ferhatosmanoglu,et al.  Investigating the use of Extrinsic Similarity Measures for Microarray Analysis , 2007 .

[144]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[145]  See-Kiong Ng,et al.  Discovering protein complexes in dense reliable neighborhoods of protein interaction networks. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[146]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[147]  See-Kiong Ng,et al.  Interaction graph mining for protein complexes using local clique merging. , 2005, Genome informatics. International Conference on Genome Informatics.

[148]  押本 浩一 Increased activity and expression of phospholipase D2 in human colorectal cancer , 2005 .

[149]  B. Hendrickson The Chaco User � s Guide Version , 2005 .

[150]  Zhenzhen Kou,et al.  Finding Motifs in Protein-Protein Interaction Networks , 2003 .

[151]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[152]  Yoshihide Hayashizaki,et al.  Interaction Generality, a Measurement to Assess the Reliability of a Protein-Protein Interaction , 2002 .

[153]  E. Serra,et al.  Promoter-specific binding of Rap1 revealed by genome-wide maps of protein-DNA association , 2001, Nature Genetics.

[154]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[155]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[156]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[157]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[158]  Steven Skiena,et al.  Implementing discrete mathematics - combinatorics and graph theory with Mathematica , 1990 .

[159]  G. Michalopoulos,et al.  Changes in the pattern of aldehyde dehydrogenase activity in primary and metastatic adenocarcinomas of the human colon. , 1987, Cancer letters.

[160]  B. Tabachnick,et al.  Using Multivariate Statistics , 1983 .

[161]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[162]  Frank Harary,et al.  Graphical enumeration , 1973 .

[163]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .