论文信息 - Building applications for interactive data exploration in systems biology

Building applications for interactive data exploration in systems biology

As the systems biology community generates and collects data at an unprecedented rate, there is a growing need for interactive data exploration tools to explore the datasets. These tools need to combine advanced statistical analyses, relevant knowledge from biological databases, and interactive visualizations in an application with clear user interfaces. To answer specific research questions tools must provide specialized user interfaces and visualizations. While these are application-specific, the underlying components of a data analysis tool can be shared and reused later. Application developers can therefore compose applications of reusable services rather than implementing a single monolithic application from the ground up for each project. Our approach for developing data exploration applications in systems biology builds on the microservice architecture. Microservice architectures separates an application into smaller components that communicate using language-agnostic protocols. We show that this design is suitable in bioinformatics applications where applications often use different tools, written in different languages, by different research groups. Packaging each service in a software container enables re-use and sharing of key components between applications, reducing development, deployment, and maintenance time. We demonstrate the viability of our approach through a web application, MIxT blood-tumor, for exploring and comparing transcriptional profiles from blood and tumor samples in breast cancer patients. The application integrates advanced statistical software, up-to-date information from biological databases, and modern data visualization libraries. The web application for exploring transcriptional profiles, MIxT, is online at mixt-blood-tumor.bci.mcgill.ca and open-sourced at github.com/fjukstad/mixt. Packages to build the supporting microservices are open-sourced as a part of Kvik at github.com/fjukstad/kvik.

[1] P. Shannon,et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[2] Dieter Schmalstieg,et al. Entourage: Visualizing Relationships between Biological Pathways using Contextual Subsets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[3] David A. Patterson,et al. ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing , 2013 .

[4] Harald Barsnes,et al. BioContainers: an open-source and community-driven framework for software standardization , 2017, Bioinform..

[6] Dieter Schmalstieg,et al. enRoute: Dynamic path extraction from biological pathway maps for in-depth experimental data analysis , 2012, 2012 IEEE Symposium on Biological Data Visualization (BioVis).

[7] Hanspeter Pfister,et al. Domino: Extracting, Comparing, and Manipulating Subsets Across Multiple Tabular Datasets , 2014, IEEE Transactions on Visualization and Computer Graphics.

[8] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[9] Maria Jesus Martin,et al. BioJS: an open source JavaScript framework for biological data visualization , 2013, Bioinform..

[10] Alexander Sczyrba,et al. Bioboxes: standardised containers for interchangeable bioinformatics software , 2015, GigaScience.

[11] Bartek Wilczynski,et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[12] Paul Shannon,et al. CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API , 2015, F1000Research.

[13] Fabian A. Buske,et al. VariantSpark: population scale clustering of genotype information , 2015, BMC Genomics.

[14] Bjørn Fjukstad,et al. Interactions between the tumor and the blood systemic response of breast cancer patients , 2017, PLoS Comput. Biol..

[15] Ulysses G. J. Balis,et al. The growing need for microservices in bioinformatics , 2016, Journal of pathology informatics.

[16] Raymond K. Auerbach,et al. The real cost of sequencing: higher than you think! , 2011, Genome Biology.

[17] Manolis Maragkakis,et al. bíogo: a simple high-performance bioinformatics toolkit for the Go language , 2014, bioRxiv.

[18] Dieter Schmalstieg,et al. Pathfinder: Visual Analysis of Paths in Graphs , 2016, Comput. Graph. Forum.

[19] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20] Hanspeter Pfister,et al. UpSet: Visualization of Intersecting Sets , 2014, IEEE Transactions on Visualization and Computer Graphics.

[21] Dieter Schmalstieg,et al. StratomeX: Visual Analysis of Large‐Scale Heterogeneous Genomics Data for Cancer Subtype Characterization , 2012, Comput. Graph. Forum.

[22] Jeroen Ooms,et al. The OpenCPU System: Towards a Universal Interface for Scientific Computing through Separation of Concerns , 2014, ArXiv.

[23] Hanspeter Pfister,et al. LineUp: Visual Analysis of Multi-Attribute Rankings , 2013, IEEE Transactions on Visualization and Computer Graphics.