BASTet: Shareable and Reproducible Analysis and Visualization of Mass Spectrometry Imaging Data via OpenMSI

Mass spectrometry imaging (MSI) is a transformative imaging method that supports the untargeted, quantitative measurement of the chemical composition and spatial heterogeneity of complex samples with broad applications in life sciences, bioenergy, and health. While MSI data can be routinely collected, its broad application is currently limited by the lack of easily accessible analysis methods that can process data of the size, volume, diversity, and complexity generated by MSI experiments. The development and application of cutting-edge analytical methods is a core driver in MSI research for new scientific discoveries, medical diagnostics, and commercial-innovation. However, the lack of means to share, apply, and reproduce analyses hinders the broad application, validation, and use of novel MSI analysis methods. To address this central challenge, we introduce the Berkeley Analysis and Storage Toolkit (BASTet), a novel framework for shareable and reproducible data analysis that supports standardized data and analysis interfaces, integrated data storage, data provenance, workflow management, and a broad set of integrated tools. Based on BASTet, we describe the extension of the OpenMSI mass spectrometry imaging science gateway to enable web-based sharing, reuse, analysis, and visualization of data analyses and derived data products. We demonstrate the application of BASTet and OpenMSI in practice to identify and compare characteristic substructures in the mouse brain based on their chemical composition measured via MSI.

[1]  Trent R Northen,et al.  Exometabolomics and MSI: deconstructing how cells interact to transform their small molecule environment. , 2015, Current opinion in biotechnology.

[2]  Jonathan C. Roberts,et al.  Visualization for the Physical Sciences , 2012, Comput. Graph. Forum.

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  Kwan-Liu Ma,et al.  VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures , 2016, IEEE Computer Graphics and Applications.

[5]  Raymond Osborn,et al.  NeXus: A common format for the exchange of neutron and synchroton data , 1997 .

[6]  Simon Anders,et al.  Visualisation of genomic data with the Hilbert curve , 2009 .

[7]  Prabhat,et al.  Ultrascale Visualization of Climate Data , 2013, Computer.

[8]  Benjamin P Bowen,et al.  "Replica-extraction-transfer" nanostructure-initiator mass spectrometry imaging of acoustically printed bacteria. , 2013, Analytical chemistry.

[9]  Arvind Satyanarayan,et al.  Reactive Vega: A Streaming Dataflow Architecture for Declarative Interactive Visualization , 2016, IEEE Transactions on Visualization and Computer Graphics.

[10]  Marie-France Robbe,et al.  imzML--a common data format for the flexible exchange and processing of mass spectrometry imaging data. , 2012, Journal of proteomics.

[11]  Michael A. Costa,et al.  Transgenic Hybrid Poplar for Sustainable and Scalable Production of the Commodity/Specialty Chemical, 2-Phenylethanol , 2013, PloS one.

[12]  R. Heeren,et al.  Mass spectrometric imaging for biomedical tissue analysis. , 2010, Chemical reviews.

[13]  Johannes E. Schindelin,et al.  Fiji: an open-source platform for biological-image analysis , 2012, Nature Methods.

[14]  Oliver Rübel,et al.  OpenMSI Arrayed Analysis Toolkit: Analyzing Spatially Defined Samples Using Mass Spectrometry Imaging. , 2017, Analytical chemistry.

[15]  William Schroeder,et al.  The Visualization Toolkit: An Object-Oriented Approach to 3-D Graphics , 1997 .

[16]  Christie A Canaria,et al.  Resolving brain regions using nanostructure initiator mass spectrometry imaging of phospholipids. , 2012, Integrative biology : quantitative biosciences from nano to macro.

[17]  Curt R Fischer,et al.  An accessible, scalable ecosystem for enabling and sharing diverse mass spectrometry imaging analyses. , 2016, Archives of biochemistry and biophysics.

[18]  Laura M. Cole,et al.  Imaging Mass Spectrometry , 2017, Methods in Molecular Biology.

[19]  Luis Ibanez,et al.  The ITK Software Guide Book 1: Introduction and Development Guidelines (Volume 1) , 2015 .

[20]  F. Maia The Coherent X-ray Imaging Data Bank , 2012, Nature Methods.

[21]  Ian T. Foster,et al.  Globus Online: Accelerating and Democratizing Science through Cloud-Based Services , 2011, IEEE Internet Computing.

[22]  Scott Kruger,et al.  VIZSCHEMA – VISUALIZATION INTERFACE FOR SCIENTIFIC DATA , 2009 .

[23]  Russ Rew,et al.  NetCDF: an interface for scientific data access , 1990, IEEE Computer Graphics and Applications.

[24]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.

[25]  Romà Tauler,et al.  Potential use of multivariate curve resolution for the analysis of mass spectrometry images. , 2015, The Analyst.

[26]  Surendra Byna,et al.  Expediting scientific data analysis with reorganization of data , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[27]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[28]  Shreyas Cholia,et al.  NEWT: A RESTful service for building High Performance Computing web applications , 2010, 2010 Gateway Computing Environments Workshop (GCE).

[29]  Oliver Rübel,et al.  OpenMSI: a high-performance web-based platform for mass spectrometry imaging. , 2013, Analytical chemistry.

[30]  E.R. Mark,et al.  Enhancements to the eXtensible Data Model and Format (XDMF) , 2007, 2007 DoD High Performance Computing Modernization Program Users Group Conference.

[31]  Prabhat,et al.  Identifying important ions and positions in mass spectrometry imaging data using CUR matrix decompositions. , 2015, Analytical chemistry.

[32]  HeerJeffrey,et al.  D3 Data-Driven Documents , 2011 .

[33]  Oliver Rübel,et al.  Dirigent Protein-Mediated Lignan and Cyanogenic Glucoside Formation in Flax Seed: Integrated Omics and MALDI Mass Spectrometry Imaging. , 2015, Journal of natural products.

[34]  Lavanya Ramakrishnan,et al.  Combining Workflow Templates with a Shared Space-Based Execution Model , 2014, 2014 9th Workshop on Workflows in Support of Large-Scale Science.

[35]  Shreyas Cholia,et al.  Toward Interactive Supercomputing at NERSC with Jupyter , 2017 .

[36]  Cláudio T. Silva,et al.  VisTrails: enabling interactive multiple-view visualizations , 2005, VIS 05. IEEE Visualization, 2005..

[37]  Hank Childs,et al.  VisIt: An End-User Tool for Visualizing and Analyzing Very Large Data , 2011 .

[38]  C. C. Law,et al.  ParaView: An End-User Tool for Large-Data Visualization , 2005, The Visualization Handbook.

[39]  Kevin W Eliceiri,et al.  NIH Image to ImageJ: 25 years of image analysis , 2012, Nature Methods.

[40]  Régis Lavigne,et al.  Localization and in situ absolute quantification of chlordecone in the mouse liver by MALDI imaging. , 2014, Analytical chemistry.

[41]  Cláudio T. Silva,et al.  Towards Provenance-Enabling ParaView , 2008, IPAW.

[42]  Peeter Ross,et al.  Direct demonstration of tissue uptake of an inhaled drug: proof-of-principle study using matrix-assisted laser desorption ionization mass spectrometry imaging. , 2011, Analytical chemistry.

[43]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[44]  Wei Chen,et al.  FireWorks: a dynamic workflow system designed for high‐throughput applications , 2015, Concurr. Comput. Pract. Exp..

[45]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.