spectrum_utils: A Python package for mass spectrometry data processing and visualization

Given the wide diversity in applications of biological mass spectrometry, custom data analyses are often needed to fully interpret the results of an experiment. Such bioinformatics scripts necessarily include similar basic functionality to read mass spectral data from standard file formats, process it, and visualize it. Rather than having to reimplement this functionality, to facilitate this task, spectrum_utils is a Python package for mass spectrometry data processing and visualization. Its high-level functionality enables developers to quickly prototype ideas for computational mass spectrometry projects in only a few lines of code. Notably, the data processing functionality is highly optimized for computational efficiency to be able to deal with the large volumes of data that are generated during mass spectrometry experiments. The visualization functionality makes it possible to easily produce publication-quality figures as well as interactive spectrum plots for inclusion on web pages. spectrum_utils is available for Python 3.6+, includes extensive online documentation and examples, and can be easily installed using conda. It is freely available as open source under the Apache 2.0 license at https://github.com/bittremieux/spectrum_utils.

[1]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[2]  Henry H N Lam,et al.  Proteome Informatics Research Group (iPRG)_2012: A Study on Detecting Modified Peptides in a Complex Mixture* , 2013, Molecular & Cellular Proteomics.

[3]  Wout Bittremieux,et al.  Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units , 2019 .

[4]  P. Dorrestein,et al.  Investigation of Premyrsinane and Myrsinane Esters in Euphorbia cupanii and Euphobia pithyusa with MS2LDA and Combinatorial Molecular Network Annotation Propagation. , 2019, Journal of natural products.

[5]  Mathias Wilhelm,et al.  Building ProteomeTools based on a complete synthetic human proteome , 2017, Nature Methods.

[6]  Michael Hippler,et al.  pymzML v2.0: introducing a highly compressed and seekable gzip format , 2018, Bioinform..

[7]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[8]  Franciane Regina Vargas,et al.  Protocol for Community-created Public MS/MS Reference Library Within the GNPS Infrastructure , 2019 .

[9]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[10]  Michael Specht,et al.  pymzML - Python module for high-throughput bioinformatics on mass spectrometry data , 2012, Bioinform..

[11]  Martin Eisenacher,et al.  Proteomics Standards Initiative: Fifteen Years of Progress and Future Work , 2017, Journal of proteome research.

[12]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[13]  Wout Bittremieux,et al.  Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing. , 2018, Journal of proteome research.

[14]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[15]  Lars Malmström,et al.  pyOpenMS: A Python‐based interface to the OpenMS mass‐spectrometry algorithm library , 2014, Proteomics.

[16]  Jeff A. Bilmes,et al.  A learned embedding for efficient joint analysis of millions of mass spectra , 2018, Nature Methods.

[17]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[18]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[19]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[20]  Lev I Levitsky,et al.  Pyteomics—a Python Framework for Exploratory Data Analysis and Rapid Software Prototyping in Proteomics , 2013, Journal of The American Society for Mass Spectrometry.

[21]  K. Reinert,et al.  OpenMS: a flexible open-source software platform for mass spectrometry data analysis , 2016, Nature Methods.

[22]  Arvind Satyanarayan,et al.  Altair: Interactive Statistical Visualizations for Python , 2018, J. Open Source Softw..

[23]  Arvind Satyanarayan,et al.  Vega-Lite: A Grammar of Interactive Graphics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[24]  Lev I Levitsky,et al.  Pyteomics 4.0: Five Years of Development of a Python Proteomics Framework. , 2018, Journal of proteome research.

[25]  Kathryn S. Lilley,et al.  MSnbase-an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation , 2012, Bioinform..

[26]  Gaël Varoquaux,et al.  Proceedings of the 20th Python in Science Conference 2021 (SciPy 2021), Virtual Conference, July 12 - July 18, 2021 , 2008, SciPy.

[27]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[28]  Wout Bittremieux,et al.  2018 YPIC Challenge: A case study in characterizing an unknown protein sample. , 2019, Journal of proteome research.