Integrating in silico resources to map a signaling network.

The abundance of publicly available life science databases offers a wealth of information that can support interpretation of experimentally derived data and greatly enhance hypothesis generation. Protein interaction and functional networks are not simply new renditions of existing data: they provide the opportunity to gain insights into the specific physical and functional role a protein plays as part of the biological system. In this chapter, we describe different in silico tools that can quickly and conveniently retrieve data from existing data repositories and we discuss how the available tools are best utilized for different purposes. While emphasizing protein-protein interaction databases (e.g., BioGrid and IntAct), we also introduce metasearch platforms such as STRING and GeneMANIA, pathway databases (e.g., BioCarta and Pathway Commons), text mining approaches (e.g., PubMed and Chilibot), and resources for drug-protein interactions, genetic information for model organisms and gene expression information based on microarray data mining. Furthermore, we provide a simple step-by-step protocol for building customized protein-protein interaction networks in Cytoscape, a powerful network assembly and visualization program, integrating data retrieved from these various databases. As we illustrate, generation of composite interaction networks enables investigators to extract significantly more information about a given biological system than utilization of a single database or sole reliance on primary literature.

[1]  B. Snel,et al.  STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. , 2000, Nucleic acids research.

[2]  Peter D Karp,et al.  Browsing metabolic and regulatory networks with BioCyc. , 2012, Methods in molecular biology.

[3]  Kimberly Van Auken,et al.  WormBase: a comprehensive resource for nematode research , 2009, Nucleic Acids Res..

[4]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[5]  Guozhen Liu,et al.  DroID: the Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions , 2008, BMC Genomics.

[6]  Erica A. Golemis,et al.  Calmodulin activation of Aurora-A kinase (AURKA) is required during ciliary disassembly and in mitosis , 2012, Molecular biology of the cell.

[7]  Peter D. Karp,et al.  EcoCyc: a comprehensive database of Escherichia coli biology , 2010, Nucleic Acids Res..

[8]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[9]  Stephen Guest,et al.  DroID 2011: a comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for Drosophila , 2010, Nucleic Acids Res..

[10]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[11]  Dmitrij Frishman,et al.  The MIPS mammalian protein?Cprotein interaction database , 2005, Bioinform..

[12]  R. Shamir,et al.  Pathway redundancy and protein essentiality revealed in the Saccharomyces cerevisiae interaction networks , 2007, Molecular systems biology.

[13]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[14]  Ming Yi,et al.  bioDBnet: the biological database network , 2009, Bioinform..

[15]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[16]  Farshad Fotouhi,et al.  A database and tool, IM Browser, for exploring and integrating emerging gene and protein interaction data for Drosophila , 2006, BMC Bioinformatics.

[17]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[18]  Karthik Devarajan,et al.  Synthetic Lethal Screen of an EGFR-Centered Network to Improve Targeted Therapies , 2010, Science Signaling.

[19]  Paul W. Sternberg,et al.  WormBase: network access to the genome and biology of Caenorhabditis elegans , 2001, Nucleic Acids Res..

[20]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[21]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[22]  Gary D. Bader,et al.  GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop , 2010, Bioinform..

[23]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[24]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[25]  Helen E. Parkinson,et al.  ArrayExpress—a public database of microarray experiments and gene expression profiles , 2006, Nucleic Acids Res..

[26]  You Jung Kim,et al.  miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST , 2005, Nucleic acids research.

[27]  Xin Gao,et al.  Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. , 2011, Current protocols in bioinformatics.

[28]  Sandra Orchard,et al.  Molecular interaction databases , 2012, Proteomics.

[29]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[30]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[31]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[32]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[33]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[34]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[35]  Alfonso Valencia,et al.  Implementing the iHOP concept for navigation of biomedical literature , 2005, ECCB/JBI.

[36]  Bin Liu,et al.  Michigan Molecular Interactions (MiMI): putting the jigsaw puzzle together , 2006, Nucleic Acids Res..

[37]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[38]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[39]  Dennis B. Troup,et al.  NCBI GEO: archive for functional genomics data sets—10 years on , 2010, Nucleic Acids Res..

[40]  I. Jurisica,et al.  Unequal evolutionary conservation of human protein interactions in interologous networks , 2007, Genome Biology.

[41]  Evan Bolton,et al.  PubChem's BioAssay Database , 2011, Nucleic Acids Res..

[42]  Ian M. Donaldson,et al.  iRefScape. A Cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex , 2011, BMC Bioinformatics.

[43]  Lan V. Zhang,et al.  Evidence for dynamically organized modularity in the yeast protein–protein interaction network , 2004, Nature.

[44]  Nathan Linial,et al.  ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree , 2011, Nucleic Acids Res..

[45]  Peer Bork,et al.  SMART 6: recent updates and new developments , 2008, Nucleic Acids Res..

[46]  Gary D Bader,et al.  NetPath: a public resource of curated signal transduction pathways , 2010, Genome Biology.

[47]  Ramón Díaz-Uriarte,et al.  IDconverter and IDClight: Conversion and annotation of gene and protein IDs , 2007, BMC Bioinformatics.

[48]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[49]  E. Birney,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Research.

[50]  Roded Sharan,et al.  Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data , 2005, J. Comput. Biol..

[51]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[52]  M. Vidal,et al.  Protein interaction maps for model organisms , 2001, Nature Reviews Molecular Cell Biology.

[53]  Y. L. Ramachandra,et al.  Human Proteinpedia enables sharing of human protein data , 2008, Nature Biotechnology.

[54]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[55]  Pablo C. Echeverría,et al.  An Interaction Network Predicted from Public Data as a Discovery Tool: Application to the Hsp90 Molecular Chaperone Machine , 2011, PloS one.

[56]  Chris T. A. Evelo,et al.  WikiPathways: building research communities on biological pathways , 2011, Nucleic Acids Res..

[57]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[58]  Michael I. Jordan,et al.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence , 2008, Genome Biology.

[59]  P. Dijke,et al.  Extracellular control of TGFβ signalling in vascular development and disease , 2007, Nature Reviews Molecular Cell Biology.

[60]  K. Bretonnel Cohen,et al.  Getting Started in Text Mining , 2008, PLoS Comput. Biol..

[61]  Charles DeLisi,et al.  Predictome: a database of putative functional links between proteins , 2002, Nucleic Acids Res..

[62]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[63]  Kohei Miyazono,et al.  TGF-β signalling from cell membrane to nucleus through SMAD proteins , 1997, Nature.

[64]  Feng Liu,et al.  The pharmacogenetics and pharmacogenomics knowledge base: accentuating the knowledge , 2007, Nucleic Acids Res..

[65]  Hesham H. Ali,et al.  Functional identification in correlation networks using gene ontology edge annotation , 2012, Int. J. Comput. Biol. Drug Des..

[66]  Kai Li,et al.  Exploring the functional landscape of gene expression: directed search of large microarray compendia , 2007, Bioinform..

[67]  Ibrahim Emam,et al.  ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments , 2010, Nucleic Acids Res..

[68]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[69]  Thomas Schlitt,et al.  Protein-protein interaction databases: keeping up with growing interactomes , 2009, Human Genomics.

[70]  Jim Thurmond,et al.  FlyBase 101 – the basics of navigating FlyBase , 2011, Nucleic Acids Res..

[71]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[72]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[73]  Hans-Werner Mewes,et al.  MPact: the MIPS protein interaction resource on yeast , 2005, Nucleic Acids Res..

[74]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[75]  Christoph Steinbeck,et al.  A large-scale protein-function database. , 2010, Nature chemical biology.

[76]  Yan Wang,et al.  VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology , 2009, Nucleic Acids Res..

[77]  Torsten Schwede,et al.  The SWISS-MODEL Repository and associated resources , 2008, Nucleic Acids Res..

[78]  Anton J. Enright,et al.  Network visualization and analysis of gene expression data using BioLayout Express3D , 2009, Nature Protocols.

[79]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[80]  Nuwee Wiwatwattana,et al.  Organelle DB: an updated resource of eukaryotic protein localization and function , 2006, Nucleic Acids Res..

[81]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[82]  Yuanfang Guan,et al.  A Genomewide Functional Network for the Laboratory Mouse , 2008, PLoS Comput. Biol..