Integration and Visualization of Translational Medicine Data for Better Understanding of Human Diseases

Abstract Translational medicine is a domain turning results of basic life science research into new tools and methods in a clinical environment, for example, as new diagnostics or therapies. Nowadays, the process of translation is supported by large amounts of heterogeneous data ranging from medical data to a whole range of -omics data. It is not only a great opportunity but also a great challenge, as translational medicine big data is difficult to integrate and analyze, and requires the involvement of biomedical experts for the data processing. We show here that visualization and interoperable workflows, combining multiple complex steps, can address at least parts of the challenge. In this article, we present an integrated workflow for exploring, analysis, and interpretation of translational medicine data in the context of human health. Three Web services—tranSMART, a Galaxy Server, and a MINERVA platform—are combined into one big data pipeline. Native visualization capabilities enable the biomedical experts to get a comprehensive overview and control over separate steps of the workflow. The capabilities of tranSMART enable a flexible filtering of multidimensional integrated data sets to create subsets suitable for downstream processing. A Galaxy Server offers visually aided construction of analytical pipelines, with the use of existing or custom components. A MINERVA platform supports the exploration of health and disease-related mechanisms in a contextualized analytical visualization system. We demonstrate the utility of our workflow by illustrating its subsequent steps using an existing data set, for which we propose a filtering scheme, an analytical pipeline, and a corresponding visualization of analytical results. The workflow is available as a sandbox environment, where readers can work with the described setup themselves. Overall, our work shows how visualization and interfacing of big data processing services facilitate exploration, analysis, and interpretation of translational medicine data.

[1]  David Gomez-Cabrero,et al.  The COPD Knowledge Base: enabling data analysis and computational simulation in translational COPD research , 2014, Journal of Translational Medicine.

[2]  E. Perakslis,et al.  Effective knowledge management in translational medicine , 2010, Journal of Translational Medicine.

[3]  Samik Ghosh,et al.  Integrating Pathways of Parkinson's Disease in a Molecular Interaction Map , 2013, Molecular Neurobiology.

[4]  Ross Lazarus,et al.  Creating reusable tools from scripts: the Galaxy Tool Factory , 2012, Bioinform..

[5]  Daniel Weindl,et al.  Complexity of dopamine metabolism , 2013, Cell Communication and Signaling.

[6]  H. Kitano,et al.  A comprehensive map of the toll-like receptor signaling network , 2006, Molecular systems biology.

[7]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[8]  Tae-Min Kim,et al.  Advances in analysis of transcriptional regulatory networks , 2011, Wiley interdisciplinary reviews. Systems biology and medicine.

[9]  Pierre Tufféry,et al.  Mobyle: a new full web bioinformatics framework , 2009, Bioinform..

[10]  Vivian West,et al.  Innovative information visualization of electronic health record data: a systematic review , 2014, J. Am. Medical Informatics Assoc..

[11]  Parnesh Raniga,et al.  Design, implementation and operation of a multimodality research imaging informatics repository , 2015, Health Information Science and Systems.

[12]  Hans-Peter Lenhof,et al.  BiNA: A Visual Analytics Tool for Biological Network Data , 2014, PloS one.

[13]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[14]  Sandra Gesing,et al.  From the desktop to the grid: scalable bioinformatics via workflow conversion , 2016, BMC Bioinformatics.

[15]  Tim J. P. Hubbard,et al.  Dalliance: interactive genome viewing on the web , 2011, Bioinform..

[16]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[17]  Jeremy Miller,et al.  A Pilot Trial Testing the Feasibility of Using Molecular-Guided Therapy in Patients with Recurrent Neuroblastoma , 2012 .

[18]  Alan Tan,et al.  BRISK - research-oriented storage kit for biology-related data , 2011, Bioinform..

[19]  Philip R. O. Payne,et al.  From Molecules to Patients: The Clinical Applications of Translational Bioinformatics , 2015, Yearbook of Medical Informatics.

[20]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[21]  Samik Ghosh,et al.  AlzPathway: a comprehensive map of signaling pathways of Alzheimer’s disease , 2012, BMC Systems Biology.

[22]  Moustafa Ghanem,et al.  Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support , 2012, BMC Bioinformatics.

[23]  Wendy A. Warr,et al.  Scientific workflow systems: Pipeline Pilot and KNIME , 2012, Journal of Computer-Aided Molecular Design.

[24]  Eric J. Topol,et al.  The big medical data miss: challenges in establishing an open medical resource , 2015, Nature Reviews Genetics.

[25]  Yike Guo,et al.  High dimensional biological data retrieval optimization with NoSQL technology , 2014, BMC Genomics.

[26]  Mariana L. Neves,et al.  A survey on annotation tools for the biomedical literature , 2014, Briefings Bioinform..

[27]  Dina Aronzon,et al.  tranSMART: An Open Source Knowledge Management and High Content Data Analytics Platform , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[28]  Piotr Gawron,et al.  MINERVA—a platform for visualization and curation of molecular interaction networks , 2016, npj Systems Biology and Applications.

[29]  Paolo Gamba,et al.  Integration of Administrative, Clinical, and Environmental Data to Support the Management of Type 2 Diabetes Mellitus , 2015, Journal of diabetes science and technology.

[30]  Bernd Wiswedel,et al.  Extending KNIME for next-generation sequencing data analysis , 2011, Bioinform..

[31]  Adam Hunter,et al.  Yabi: An online research environment for grid, high performance and cloud computing , 2012, Source Code for Biology and Medicine.

[32]  Andrew Lonie,et al.  Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud , 2015, PloS one.

[33]  Martin Hofmann-Apitius,et al.  Bioinformatics Mining and Modeling Methods for the Identification of Disease Mechanisms in Neurodegenerative Disorders , 2015, International journal of molecular sciences.

[34]  Fred H. Gage,et al.  Mechanisms Underlying Inflammation in Neurodegeneration , 2010, Cell.

[35]  Eric Bender,et al.  Big data in biomedicine , 2015, Nature.

[36]  L. Hood,et al.  Systems cancer medicine: towards realization of predictive, preventive, personalized and participatory (P4) medicine , 2012, Journal of internal medicine.

[37]  I. Kohane,et al.  Instrumenting the health care enterprise for discovery research in the genomic era. , 2009, Genome research.

[38]  Axel Schumacher,et al.  A collaborative approach to develop a multi-omics data analytics platform for translational research , 2014, Applied & translational genomics.

[39]  Ryan Miller,et al.  WikiPathways: capturing the full diversity of pathway knowledge , 2015, Nucleic Acids Res..

[40]  Sarala M. Wimalaratne,et al.  The Systems Biology Graphical Notation , 2009, Nature Biotechnology.

[41]  Philip R. O. Payne,et al.  TRIAD: The Translational Research Informatics and Data Management Grid , 2011, Applied Clinical Informatics.

[42]  H. Kitano,et al.  A comprehensive map of the mTOR signaling network , 2010, Molecular systems biology.

[43]  Jake Luo,et al.  Big Data Application in Biomedical Research and Health Care: A Literature Review , 2016, Biomedical informatics insights.

[44]  Eric Bender,et al.  Big data in biomedicine: 4 big questions , 2015, Nature.

[45]  Marta Canuti,et al.  Influenza and Other Respiratory Viruses Involved in Severe Acute Respiratory Disease in Northern Italy during the Pandemic and Postpandemic Period (2009–2011) , 2014, BioMed research international.

[46]  Sven Rahmann,et al.  Genome analysis , 2022 .

[47]  Anton Nekrutenko,et al.  Harnessing cloud computing with Galaxy Cloud , 2011, Nature Biotechnology.

[48]  Arthur W. Toga,et al.  Big biomedical data as the key resource for discovery science , 2015, J. Am. Medical Informatics Assoc..

[49]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[50]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[51]  Daniel J. Blankenberg,et al.  Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists , 2010, Current protocols in molecular biology.

[52]  Keith Marsolo,et al.  An i2b2-based, generalizable, open source, self-scaling chronic disease registry , 2012, J. Am. Medical Informatics Assoc..

[53]  Jennifer Harris,et al.  Genomic cloud computing: legal and ethical points to consider , 2014, European Journal of Human Genetics.

[54]  Miklós Kozlovszky,et al.  WS-PGRADE/gUSE Generic DCI Gateway Framework for a Large Variety of User Communities , 2012, Journal of Grid Computing.

[55]  Lori C. Phillips,et al.  Using the i2b2 hive for clinical discovery: an example. , 2007, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[56]  Samik Ghosh,et al.  A comprehensive map of the influenza A virus replication cycle , 2013, BMC Systems Biology.

[57]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[58]  U. Bonuccelli,et al.  Acute and chronic cognitive effects of levodopa and dopamine agonists on patients with Parkinson’s disease: a review , 2013, Therapeutic advances in psychopharmacology.

[59]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[60]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[61]  Subha Madhavan,et al.  Platform for Personalized Oncology: Integrative analyses reveal novel molecular signatures associated with colorectal cancer relapse. , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[62]  D. Maraganore,et al.  A Genomic Pathway Approach to a Complex Disease: Axon Guidance and Parkinson Disease , 2007, PLoS genetics.

[63]  E. Mardis The $1,000 genome, the $100,000 analysis? , 2010, Genome Medicine.

[64]  Patrice Degoulet,et al.  Translational research platforms integrating clinical and omics data: a review of publicly available solutions , 2014, Briefings Bioinform..

[65]  Fabrício F. Costa Big data in biomedicine. , 2014, Drug discovery today.

[66]  Ben Shneiderman,et al.  Improving Healthcare with Interactive Visualization , 2013, Computer.

[67]  Jihoon Kim,et al.  iDASH: integrating data for analysis, anonymization, and sharing , 2012, J. Am. Medical Informatics Assoc..

[68]  Mikko Koski,et al.  Chipster: user-friendly analysis software for microarray and other high-throughput data , 2011, BMC Genomics.

[69]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[70]  H. Kitano,et al.  A comprehensive pathway map of epidermal growth factor receptor signaling , 2005, Molecular systems biology.

[71]  David Meyre,et al.  From big data analysis to personalized medicine for all: challenges and opportunities , 2015, BMC Medical Genomics.

[72]  E. Barillot,et al.  Atlas of Cancer Signalling Network: a systems biology resource for integrative analysis of cancer data with Google Maps , 2015, Oncogenesis.

[73]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[74]  Karin M. Verspoor,et al.  Big Data in Medicine Is Driving Big Changes , 2014, Yearbook of Medical Informatics.

[75]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[76]  Inna Kuperstein,et al.  NaviCell Web Service for network-based data visualization , 2015, Nucleic Acids Res..

[77]  N. Tatonetti,et al.  Connecting the Dots: Applications of Network Medicine in Pharmacology and Disease , 2013, Clinical pharmacology and therapeutics.

[78]  Joel H. Saltz,et al.  Model Formulation: caGrid 1.0: An Enterprise Grid Infrastructure for Biomedical Research , 2008, J. Am. Medical Informatics Assoc..

[79]  Ivan Merelli,et al.  Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives , 2014, BioMed research international.

[80]  Rion Dooley,et al.  Software-as-a-Service: The iPlant Foundation API , 2012 .

[81]  Michael Stonebraker,et al.  Data Curation at Scale: The Data Tamer System , 2013, CIDR.

[82]  Andreas Krämer,et al.  Causal analysis approaches in Ingenuity Pathway Analysis , 2013, Bioinform..

[83]  Ted D Wade,et al.  Traits and types of health data repositories , 2014, Health Information Science and Systems.

[84]  Erfan Younesi,et al.  Advancements in Data Management and Data Mining Approaches , 2016 .

[85]  Jeremy Leipzig,et al.  A review of bioinformatic pipeline frameworks , 2016, Briefings Bioinform..

[86]  Reinhard Schneider,et al.  PathVar: analysis of gene and protein expression variance in cellular pathways using microarray data , 2011, Bioinform..

[87]  E. Bullmore,et al.  Impaired long distance functional connectivity and weighted network architecture in Alzheimer's disease. , 2014, Cerebral cortex.

[88]  Clara Pizzuti,et al.  Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods , 2014, Bioinform..

[89]  Pierre Grenon,et al.  ApiNATOMY: A novel toolkit for visualizing multiscale anatomy schematics with phenotype‐related information , 2012, Human mutation.