Network-based interactive navigation and analysis of large biological datasets

Abstract Over the last decade, advances in high-throughput technologies have resulted in a flood of new biological data. Here, individual samples can extend up into terabyte size. While potential applications are broad, ranging from biotechnology to medical applications, the analysis of these datasets poses massive challenges. In order to make use of the produced terabytes of data, these datasets need to be integrated, need to be mapped onto existing biological knowledge, and need to be explored by experts. We present UniPAX and BiNA, a scalable system for the integration and analysis of high-throughput data (genomics, transcriptomics, proteomics, and metabolomics) in a network context. A central data warehouse holds the core dataset. A flexible middleware can execute custom queries on this dataset and communicate with our visual analytics tool BiNA, the Biological Network Analyzer. We demonstrate how the combination of these tools permits an efficient analysis of large-scale datasets for medical applications.

[1]  Michael Hucka,et al.  The Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 1 Core , 2010, J. Integr. Bioinform..

[2]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[3]  Christina Backes,et al.  A novel algorithm for detecting differentially regulated paths based on gene set enrichment analysis , 2009, Bioinform..

[4]  Yan Wang,et al.  The tumor suppressor role of Src homology phosphotyrosine phosphatase 2 in hepatocellular carcinoma , 2012, Journal of Cancer Research and Clinical Oncology.

[5]  Hans-Peter Lenhof,et al.  BiNA: A Visual Analytics Tool for Biological Network Data , 2014, PloS one.

[6]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[7]  Matthew A. Hibbs,et al.  Visualization of omics data for systems biology , 2010, Nature Methods.

[8]  Christina Backes,et al.  An integer linear programming approach for finding deregulated subgraphs in regulatory networks , 2011, Nucleic acids research.

[9]  Chris T. A. Evelo,et al.  The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services , 2010, BMC Bioinformatics.

[10]  Hendrik Lehnert,et al.  Interaction of tumor cells with the microenvironment , 2011, Cell Communication and Signaling.

[11]  Iftikhar J. Kullo,et al.  Ethical, legal, and social implications of incorporating genomic information into electronic health records , 2013, Genetics in Medicine.

[12]  Yan Wang,et al.  VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology , 2009, Nucleic Acids Res..

[13]  Norman Breslow,et al.  Subsets of Very Low Risk Wilms Tumor Show Distinctive Gene Expression, Histologic, and Clinical Features , 2009, Clinical Cancer Research.

[14]  Bang Wong,et al.  Visualizing biological data—now and in the future , 2010, Nature Methods.

[15]  Michael Kaufmann,et al.  Rebuilding KEGG Maps - An integrative approach for visual analytics of metabolic networks , 2014, PacificVis 2014.

[16]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[17]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[18]  Ales Linhart,et al.  Placental growth factor may predict increased left ventricular mass index in patients with mild to moderate chronic kidney disease – a prospective observational study , 2013, BMC Nephrology.

[19]  Kei-Hoi Cheung,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[20]  Michael Hucka,et al.  LibSBML: an API Library for SBML , 2008, Bioinform..

[21]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[22]  Matthias Klapperstück,et al.  VANTED v2: a framework for systems biology applications , 2012, BMC Systems Biology.

[23]  J C Schaff,et al.  Integrating BioPAX pathway knowledge with SBML models. , 2009, IET systems biology.

[24]  M. Roizen,et al.  Hallmarks of Cancer: The Next Generation , 2012 .

[25]  N. Kikuchi,et al.  CellDesigner 3.5: A Versatile Modeling Tool for Biochemical Networks , 2008, Proceedings of the IEEE.

[26]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[27]  Sarala M. Wimalaratne,et al.  The Systems Biology Graphical Notation , 2009, Nature Biotechnology.

[28]  Péter Kovács,et al.  LEMON - an Open Source C++ Graph Template Library , 2011, WGT@ETAPS.

[29]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[30]  Christina Backes,et al.  NetworkTrail - a web service for identifying and visualizing deregulated subnetworks , 2013, Bioinform..

[31]  Roy T. Fielding,et al.  Principled design of the modern Web architecture , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.