CAPER 3.0: A Scalable Cloud-Based System for Data-Intensive Analysis of Chromosome-Centric Human Proteome Project Data Sets.

The Chromosome-centric Human Proteome Project (C-HPP) aims to catalog genome-encoded proteins using a chromosome-by-chromosome strategy. As the C-HPP proceeds, the increasing requirement for data-intensive analysis of the MS/MS data poses a challenge to the proteomic community, especially small laboratories lacking computational infrastructure. To address this challenge, we have updated the previous CAPER browser into a higher version, CAPER 3.0, which is a scalable cloud-based system for data-intensive analysis of C-HPP data sets. CAPER 3.0 uses cloud computing technology to facilitate MS/MS-based peptide identification. In particular, it can use both public and private cloud, facilitating the analysis of C-HPP data sets. CAPER 3.0 provides a graphical user interface (GUI) to help users transfer data, configure jobs, track progress, and visualize the results comprehensively. These features enable users without programming expertise to easily conduct data-intensive analysis using CAPER 3.0. Here, we illustrate the usage of CAPER 3.0 with four specific mass spectral data-intensive problems: detecting novel peptides, identifying single amino acid variants (SAVs) derived from known missense mutations, identifying sample-specific SAVs, and identifying exon-skipping events. CAPER 3.0 is available at http://prodigy.bprc.ac.cn/caper3.

[1]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[2]  Damian Fermin,et al.  Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics , 2006, Genome Biology.

[3]  Jun Wang,et al.  A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data , 2008, BMC Bioinformatics.

[4]  M. Schatz,et al.  Searching for SNPs with cloud computing , 2009, Genome Biology.

[5]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[6]  Brian D Halligan,et al.  Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms. , 2009, Journal of proteome research.

[7]  Michael C. Schatz,et al.  Cloud Computing and the DNA Data Race , 2010, Nature Biotechnology.

[8]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[9]  G. Ast,et al.  Alternative splicing and evolution: diversification, exon definition and function , 2010, Nature Reviews Genetics.

[10]  P. Mell,et al.  SP 800-145. The NIST Definition of Cloud Computing , 2011 .

[11]  Armando Fox,et al.  Cloud Computing—What's in It for Me as a Scientist? , 2011, Science.

[12]  Syed Haider,et al.  Ensembl BioMarts: a hub for data retrieval across taxonomic space , 2011, Database J. Biol. Databases Curation.

[13]  Yassene Mohammed,et al.  Cloud parallel processing of tandem mass spectrometry based proteomics data. , 2012, Journal of proteome research.

[14]  S. Hanash,et al.  Standard guidelines for the chromosome-centric human proteome project. , 2012, Journal of proteome research.

[15]  S. Hanash,et al.  The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome , 2012, Nature Biotechnology.

[16]  J. Jeffry Howbert,et al.  MR-Tandem: parallel X!Tandem using Hadoop MapReduce on Amazon Web Services , 2012, Bioinform..

[17]  S. Mathivanan,et al.  Identifying mutated proteins secreted by colon cancer cell lines using mass spectrometry. , 2012, Journal of proteomics.

[18]  Alexander V. Tyakht,et al.  Chromosome 18 transcriptome profiling and targeted proteome mapping in depleted plasma, liver tissue and HepG2 cells. , 2013, Journal of proteome research.

[19]  Hoguen Kim,et al.  GenomewidePDB, a proteomic database exploring the comprehensive protein parts list and transcriptome landscape in human chromosomes. , 2013, Journal of proteome research.

[20]  A. Paulus,et al.  The chromosome-centric human proteome project: a call to action. , 2013, Journal of proteome research.

[21]  William S Hancock,et al.  The proteome browser web portal. , 2013, Journal of proteome research.

[22]  G. Omenn,et al.  A first step toward completion of a genome-wide characterization of the human proteome. , 2013, Journal of proteome research.

[23]  Brian L. Frey,et al.  Discovery and Mass Spectrometric Analysis of Novel Splice-junction Peptides Using RNA-Seq* , 2013, Molecular & Cellular Proteomics.

[24]  Dan Wang,et al.  CAPER: a chromosome-assembled human proteome browsER. , 2013, Journal of proteome research.

[25]  Juan Antonio Vizcaíno,et al.  A toolkit for the mzIdentML standard: the ProteoIDViewer, the mzidLibrary and the mzidValidator , 2013 .

[26]  Michael R. Shortreed,et al.  Large-scale mass spectrometric detection of variant peptides resulting from nonsynonymous nucleotide differences. , 2014, Journal of proteome research.

[27]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[28]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[29]  William S Hancock,et al.  Genome-wide proteomics, Chromosome-Centric Human Proteome Project (C-HPP), part II. , 2014, Journal of Proteome Research.

[30]  Xinlei Zhang,et al.  CAPER 2.0: an interactive, configurable, and extensible workflow-based platform to analyze data sets from the Chromosome-centric Human Proteome Project. , 2014, Journal of proteome research.

[31]  Andrey Tovchigrechko,et al.  PGP: parallel prokaryotic proteogenomics pipeline for MPI clusters, high-throughput batch clusters and multicore workstations , 2014, Bioinform..