Recent advances in mass-spectrometry based proteomics software, tools and databases.

The field of proteomics immensely depends on data generation and data analysis which are thoroughly supported by software and databases. There has been a massive advancement in mass spectrometry-based proteomics over the last 10 years which has compelled the scientific community to upgrade or develop algorithms, tools, and repository databases in the field of proteomics. Several standalone software, and comprehensive databases have aided the establishment of integrated omics pipeline and meta-analysis workflow which has contributed to understand the disease pathobiology, biomarker discovery and predicting new therapeutic modalities. For shotgun proteomics where Data Dependent Acquisition is performed, several user-friendly software are developed that can analyse the pre-processed data to provide mechanistic insights of the disease. Likewise, in Data Independent Acquisition, pipelines are emerged which can accomplish the task from building the spectral library to identify the therapeutic targets. Furthermore, in the age of big data analysis the implications of machine learning and cloud computing are appending robustness, rapidness and in-depth proteomics data analysis. The current review talks about the recent advancement, and development of software, tools, and database in the field of mass-spectrometry based proteomics.

[1]  Chunjie Luo,et al.  pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning. , 2017, Analytical chemistry.

[2]  Juhani Aakko,et al.  A Data Analysis Protocol for Quantitative Data-Independent Acquisition Proteomics. , 2018, Methods in molecular biology.

[3]  D. Park,et al.  Proteogenomic Characterization of Human Early-Onset Gastric Cancer. , 2019, Cancer cell.

[4]  Shisheng Wang,et al.  pseudoQC: A Regression‐Based Simulation Software for Correction and Normalization of Complex Metabolomics and Proteomics Datasets , 2019, Proteomics.

[5]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[6]  Christoph B. Messner,et al.  DIA-NN: Neural networks and interference correction enable deep proteome coverage in high throughput , 2019, Nature Methods.

[7]  Nichole L. King,et al.  The PeptideAtlas Project , 2010, Proteome Bioinformatics.

[8]  Luis Mendoza,et al.  Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline* , 2014, Molecular & Cellular Proteomics.

[9]  Saicharan Ghantasala,et al.  An Integrated Quantitative Proteomics Workflow for Cancer Biomarker Discovery and Validation in Plasma , 2020, Frontiers in Oncology.

[10]  Jesper V Olsen,et al.  Rapid and deep proteomes by faster sequencing on a benchtop quadrupole ultra-high-field Orbitrap mass spectrometer. , 2014, Journal of proteome research.

[11]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[12]  Luis Mendoza,et al.  Trans‐Proteomic Pipeline, a standardized data processing pipeline for large‐scale reproducible proteomics informatics , 2015, Proteomics. Clinical applications.

[13]  Matus Medo,et al.  ProtRank: bypassing the imputation of missing values in differential expression analysis of proteomic data , 2019, BMC Bioinformatics.

[14]  Mehdi Mesri,et al.  Connecting genomic alterations to cancer biology with proteomics: the NCI Clinical Proteomic Tumor Analysis Consortium. , 2013, Cancer discovery.

[15]  Xi Chen,et al.  QuantPipe: A User-Friendly Pipeline Software Tool for DIA Data Analysis Based on the OpenSWATH-PyProphet-TRIC Workflow. , 2020, Journal of proteome research.

[16]  Oliver M. Bernhardt,et al.  Extending the Limits of Quantitative Proteome Profiling with Data-Independent Acquisition and Application to Acetaminophen-Treated Three-Dimensional Liver Microtissues* , 2015, Molecular & Cellular Proteomics.

[17]  Yuanyue Li,et al.  Group-DIA: analyzing multiple data-independent acquisition mass spectrometry data files , 2015, Nature Methods.

[18]  Bin Zhang,et al.  PhosphoSitePlus, 2014: mutations, PTMs and recalibrations , 2014, Nucleic Acids Res..

[19]  Lukas Krasny,et al.  Data-independent acquisition mass spectrometry (DIA-MS) for proteomic applications in oncology. , 2020, Molecular omics.

[20]  Michael J MacCoss,et al.  Thesaurus: quantifying phosphopeptide positional isomers , 2019, Nature Methods.

[21]  Haiyan Tan,et al.  JUMP: A Tag-based Database Search Tool for Peptide Identification with High Sensitivity and Accuracy* , 2014, Molecular & Cellular Proteomics.

[22]  M. Mann,et al.  Deep and Highly Sensitive Proteome Coverage by LC-MS/MS Without Prefractionation* , 2011, Molecular & Cellular Proteomics.

[23]  V. Marx Targeted proteomics , 2013, Nature Methods.

[24]  Hongmei Lu,et al.  Deep MS/MS-Aided Structural-Similarity Scoring for Unknown Metabolite Identification. , 2019, Analytical chemistry.

[25]  Ruedi Aebersold,et al.  Quantitative proteomics by stable isotope labeling and mass spectrometry. , 2007, Methods in molecular biology.

[26]  Qi Zhao,et al.  qPhos: a database of protein phosphorylation dynamics in humans , 2018, Nucleic Acids Res..

[27]  Subha Madhavan,et al.  Proteogenomic Analysis of Human Colon Cancer Reveals New Therapeutic Opportunities , 2019, Cell.

[28]  Robert Petryszak,et al.  Discovering and linking public omics data sets using the Omics Discovery Index , 2017, Nature Biotechnology.

[29]  Lisa M. Chung,et al.  Review of software tools for design and analysis of large scale MRM proteomic datasets. , 2013, Methods.

[30]  Jingchun Chen,et al.  ATAQS: A computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry , 2011, BMC Bioinformatics.

[31]  Paolo Cifani,et al.  ProteomeGenerator: A framework for comprehensive proteomics based on de novo transcriptome assembly and high-accuracy peptide mass spectral matching , 2017, bioRxiv.

[32]  Mathias Wilhelm,et al.  Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning , 2019, Nature Methods.

[33]  J. Yates,et al.  Isobaric Labeling-Based Relative Quantification in Shotgun Proteomics , 2014, Journal of proteome research.

[34]  Rebekah L. Gundry,et al.  A high-stringency blueprint of the human proteome , 2020, Nature Communications.

[35]  P. Wild,et al.  Missing value imputation in proximity extension assay-based targeted proteomics data , 2020, PloS one.

[36]  Yasset Perez-Riverol,et al.  MassIVE.quant: a community resource of quantitative mass spectrometry-based proteomics datasets , 2020, Nature Methods.

[37]  Marco Y. Hein,et al.  The Perseus computational platform for comprehensive analysis of (prote)omics data , 2016, Nature Methods.

[38]  B. F. Francis Ouellette,et al.  ActiveDriverDB: human disease mutations and genome variation in post-translational modification sites of proteins , 2017, bioRxiv.

[39]  Cheng Chang,et al.  In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values , 2017, Scientific Reports.

[40]  John R Yates,et al.  Recent technical advances in proteomics , 2019, F1000Research.

[41]  Samuel H Payne,et al.  PECAN: Library Free Peptide Detection for Data-Independent Acquisition Tandem Mass Spectrometry Data , 2017, Nature Methods.

[42]  Xue Cai,et al.  Data‐Independent Acquisition Mass Spectrometry‐Based Proteomics and Software Tools: A Glimpse in 2020 , 2020, Proteomics.

[43]  Xiaohui Liu,et al.  In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics , 2020, Nature Communications.

[44]  M. Albert,et al.  Phosphotyrosine profiling of human cerebrospinal fluid , 2018, Clinical Proteomics.

[45]  Michael Riffle,et al.  Proteomics data repositories , 2009, Proteomics.

[46]  Eva Friedel,et al.  Simulating ComBat: how batch correction can lead to the systematic introduction of false positive results in DNA methylation microarray studies , 2020, BMC Bioinformatics.

[47]  Bin Ma,et al.  PEAKS DB: De Novo Sequencing Assisted Database Search for Sensitive and Accurate Peptide Identification* , 2011, Molecular & Cellular Proteomics.

[48]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[49]  Lennart Martens,et al.  Database Search Engines: Paradigms, Challenges and Solutions. , 2016, Advances in experimental medicine and biology.

[50]  Lennart Martens,et al.  ProteoCloud: a full-featured open source proteomics cloud computing pipeline. , 2013, Journal of proteomics.

[51]  A. Makarov,et al.  The Orbitrap: a new mass spectrometer. , 2005, Journal of mass spectrometry : JMS.

[52]  Ronald J. Moore,et al.  Reproducible workflow for multiplexed deep-scale proteome and phosphoproteome analysis of tumor tissues by liquid chromatography–mass spectrometry , 2018, Nature Protocols.

[53]  Florian Gnad,et al.  PHOSIDA 2011: the posttranslational modification database , 2010, Nucleic Acids Res..

[54]  Amos Bairoch,et al.  The neXtProt knowledgebase in 2020: data, tools and usability improvements , 2019, Nucleic Acids Res..

[55]  Jüergen Cox,et al.  The MaxQuant computational platform for mass spectrometry-based shotgun proteomics , 2016, Nature Protocols.

[56]  Lennart Martens,et al.  Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows. , 2020, Mass spectrometry reviews.

[57]  Conrad Bessant,et al.  MRMaid: The SRM Assay Design Tool for Arabidopsis and Other Species , 2012, Front. Plant Sci..

[58]  Edward L. Huttlin,et al.  An ultra-tolerant database search reveals that a myriad of modified peptides contributes to unassigned spectra in shotgun proteomics , 2015, Nature Biotechnology.

[59]  David S. Wishart,et al.  MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis , 2018, Nucleic Acids Res..

[60]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .

[61]  George E Karniadakis,et al.  Omics, big data and machine learning as tools to propel understanding of biological mechanisms and to discover novel diagnostics and therapeutics. , 2018, Metabolism: clinical and experimental.

[62]  Jingqiu Cheng,et al.  NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses , 2020, Nucleic acids research.

[63]  Jeffrey R. Whiteaker,et al.  Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma , 2020, Cell.

[64]  B. Searle,et al.  “Plug-and-play” investigation of the human phosphoproteome by targeted high-resolution mass spectrometry , 2016, Nature Methods.

[65]  Nelson Perdigão,et al.  Dark Proteome Database: Studies on Dark Proteins , 2019, High-throughput.

[66]  Philipp E. Geyer,et al.  Ultra-deep and quantitative saliva proteome reveals dynamics of the oral microbiome , 2016, Genome Medicine.

[67]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[68]  Ivan Merelli,et al.  High-Performance Computing and Big Data in Omics-Based Medicine , 2014, BioMed research international.

[69]  Martin Kircher,et al.  Deep proteome and transcriptome mapping of a human cancer cell line , 2011, Molecular systems biology.

[70]  Ngoc Hieu Tran,et al.  Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry , 2018, Nature Methods.

[71]  Manuel A. S. Santos,et al.  De novo sequencing of proteins by mass spectrometry , 2020, Expert review of proteomics.

[72]  Hsien-Da Huang,et al.  dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications , 2018, Nucleic Acids Res..

[73]  Jianlin Cheng,et al.  Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis , 2020, International journal of molecular sciences.

[74]  Juan Antonio Vizcaíno,et al.  The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition , 2016, Nucleic Acids Res..

[75]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[76]  Normalization of mass spectrometry data (NOMAD). , 2017, Advances in biological regulation.

[77]  Fredrik Levander,et al.  Normalyzer: A Tool for Rapid Evaluation of Normalization Methods for Omics Data Sets , 2014, Journal of proteome research.

[78]  David L Tabb,et al.  DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. , 2008, Journal of proteome research.

[79]  Nuno Bandeira,et al.  Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0. , 2019, Journal of proteome research.

[80]  A. Marshall,et al.  High-resolution mass spectrometers. , 2008, Annual review of analytical chemistry.