FUSE: a profit maximization approach for functional summarization of biological networks

BackgroundThe availability of large-scale curated protein interaction datasets has given rise to the opportunity to investigate higher level organization and modularity within the protein interaction network (PPI) using graph theoretic analysis. Despite the recent progress, systems level analysis of PPIS remains a daunting task as it is challenging to make sense out of the deluge of high-dimensional interaction data. Specifically, techniques that automatically abstract and summarizePPIS at multiple resolutions to provide high level views of its functional landscape are still lacking. We present a novel data-driven and generic algorithm called FUSE (Fu nctional S ummary Ge nerator) that generates functional maps of a PPI at different levels of organization, from broad process-process level interactions to in-depth complex-complex level interactions, through a pro t maximization approach that exploits Minimum Description Length (MDL) principle to maximize information gain of the summary graph while satisfying the level of detail constraint.ResultsWe evaluate the performance of FUSE on several real-world PPIS. We also compare FUSE to state-of-the-art graph clustering methods with GO term enrichment by constructing the biological process landscape of the PPIS. Using AD network as our case study, we further demonstrate the ability of FUSE to quickly summarize the network and identify many different processes and complexes that regulate it. Finally, we study the higher-order connectivity of the human PPI.ConclusionBy simultaneously evaluating interaction and annotation data, FUSE abstracts higher-order interaction maps by reducing the details of the underlying PPI to form a functional summary graph of interconnected functional clusters. Our results demonstrate its effectiveness and superiority over state-of-the-art graph clustering methods with GO term enrichment.

[1]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[2]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[3]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[4]  Brian J. Bacskai,et al.  Aβ Plaques Lead to Aberrant Regulation of Calcium Homeostasis In Vivo Resulting in Structural and Functional Disruption of Neuronal Networks , 2008, Neuron.

[5]  A. Grierson,et al.  Role of axonal transport in neurodegenerative diseases. , 2008, Annual review of neuroscience.

[6]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[7]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[8]  R. Carter 11 – IT and society , 1991 .

[9]  Sourav S. Bhowmick,et al.  Fuse: towards multi-level functional summarization of protein interaction networks , 2011, BCB '11.

[10]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[11]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[12]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[13]  Haruki Nakamura,et al.  Disordered domains and high surface charge confer hubs with the ability to interact with multiple proteins in interaction networks , 2006, FEBS letters.

[14]  Joel S. Bader,et al.  NeMo: Network Module identification in Cytoscape , 2010, BMC Bioinformatics.

[15]  B. Doble,et al.  GSK-3: tricks of the trade for a multi-tasking kinase , 2003, Journal of Cell Science.

[16]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[17]  E. Todeva Networks , 2007 .

[18]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[19]  Karl Herrup,et al.  Cell cycle regulation in the postmitotic neuron: oxymoron or new biology? , 2007, Nature Reviews Neuroscience.

[20]  Paula van Tijn,et al.  Wnt signaling in Alzheimer's disease: Up or down, that is the question , 2009, Ageing Research Reviews.

[21]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[22]  Brett M. Collins,et al.  Vesicle Transport: A New Player in APP Trafficking , 2010, Current Biology.

[23]  M. Beal,et al.  Mitochondrial dysfunction and oxidative stress in neurodegenerative diseases , 2006, Nature.

[24]  Matthew A. Hibbs,et al.  Exploring the human genome with functional maps. , 2009, Genome research.

[25]  Martine D. F. Schlag,et al.  Spectral K-Way Ratio-Cut Partitioning and Clustering , 1993, 30th ACM/IEEE Design Automation Conference.

[26]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[27]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[28]  Inderjit S. Dhillon,et al.  A fast kernel-based multilevel algorithm for graph clustering , 2005, KDD '05.

[29]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[30]  Mihai Pop,et al.  Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information , 2009, J. Comput. Biol..

[31]  Xiaoying Gao,et al.  QC4 - A Clustering Evaluation Method , 2007, PAKDD.

[32]  BMC Bioinformatics , 2005 .

[33]  Berislav V. Zlokovic,et al.  Neurovascular mechanisms of Alzheimer's neurodegeneration , 2005, Trends in Neurosciences.

[34]  Hans-Peter Kriegel,et al.  A generic framework for efficient subspace clustering of high-dimensional data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[35]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[36]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[37]  N. Perzov,et al.  The cellular biology of proton-motive force generation by V-ATPases. , 2000, The Journal of experimental biology.

[38]  D. Selkoe Folding proteins in fatal ways , 2003, Nature.

[39]  Anthony K. H. Tung,et al.  CSV: visualizing and mining cohesive subgraphs , 2008, SIGMOD Conference.