PGTools: A Software Suite for Proteogenomic Data Analysis and Visualization.

We describe PGTools, an open source software suite for analysis and visualization of proteogenomic data. PGTools comprises applications, libraries, customized databases, and visualization tools for analysis of mass-spectrometry data using combined proteomic and genomic backgrounds. A single command is sufficient to search databases, calculate false discovery rates, group and annotate proteins, generate peptide databases from RNA-Seq transcripts, identify altered proteins associated with cancer, and visualize genome scale peptide data sets using sophisticated visualization tools. We experimentally confirm a subset of proteogenomic peptides in human PANC-1 cells and demonstrate the utility of PGTools using a colorectal cancer data set that led to the identification of 203 novel protein coding regions missed by conventional proteomic approaches. PGTools should be equally useful for individual proteogenomic investigations as well as international initiatives such as chromosome-centric Human Proteome Project (C-HPP). PGTools is available at http://qcmg.org/bioinformatics/PGTools.

[1]  Xun Xu,et al.  sapFinder: an R/Bioconductor package for detection of variant peptides in shotgun proteomics experiments , 2014, Bioinform..

[2]  James E. Johnson,et al.  Flexible and Accessible Workflows for Improved Proteogenomic Analysis Using the Galaxy Framework , 2014, Journal of proteome research.

[3]  James E. Johnson,et al.  Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations , 2014, BMC Genomics.

[4]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[5]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[6]  J. Chang,et al.  LRIG1 modulates aggressiveness of head and neck cancers by regulating EGFR-MAPK-SPHK1 signaling and extracellular matrix remodeling , 2014, Oncogene.

[7]  Gennifer E. Merrihew,et al.  Proteogenomic database construction driven from large scale RNA-seq data. , 2014, Journal of proteome research.

[8]  Michael R. Shortreed,et al.  Large-scale mass spectrometric detection of variant peptides resulting from nonsynonymous nucleotide differences. , 2014, Journal of proteome research.

[9]  Alexander I Archakov,et al.  Gene-centric content management system. , 2014, Biochimica et biophysica acta.

[10]  Xiaojing Wang,et al.  Proteogenomic analysis reveals unanticipated adaptations of colorectal tumor cells to deficiencies in DNA mismatch repair. , 2014, Cancer research.

[11]  California Jack Cassidy,et al.  An Automated Proteogenomic Method Uses Mass Spectrometry to Reveal Novel Genes in Zea mays* , 2013, Molecular & Cellular Proteomics.

[12]  Mehdi Mesri,et al.  Connecting genomic alterations to cancer biology with proteomics: the NCI Clinical Proteomic Tumor Analysis Consortium. , 2013, Cancer discovery.

[13]  Xiaojing Wang,et al.  customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search , 2013, Bioinform..

[14]  Shivashankar H. Nagaraj,et al.  Proteogenomic Analysis of Bradyrhizobium japonicum USDA110 Using Genosuite, an Automated Multi-algorithmic Pipeline* , 2013, Molecular & Cellular Proteomics.

[15]  Eric W. Deutsch,et al.  Combining Results of Multiple Search Engines in Proteomics* , 2013, Molecular & Cellular Proteomics.

[16]  William S Hancock,et al.  The proteome browser web portal. , 2013, Journal of proteome research.

[17]  Dan Wang,et al.  CAPER: a chromosome-assembled human proteome browsER. , 2013, Journal of proteome research.

[18]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[19]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[20]  S. Pardis,et al.  MCM3 as a novel diagnostic marker in benign and malignant salivary gland tumors. , 2013, Asian Pacific journal of cancer prevention : APJCP.

[21]  F. Ye,et al.  Downregulation of LRIG1 expression by RNA interference promotes the aggressive properties of glioma cells via EGFR/Akt/c-Myc activation. , 2013, Oncology reports.

[22]  R. Henriksson,et al.  Expression of LRIG1 and LRIG3 correlates with human papillomavirus status and patient survival in cervical adenocarcinoma. , 2013, International journal of oncology.

[23]  Lincoln D. Stein,et al.  Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes , 2012, Nature.

[24]  M. Mann,et al.  Extensive quantitative remodeling of the proteome between normal colon tissue and adenocarcinoma , 2012, Molecular systems biology.

[25]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[26]  Bruce J. Aronow,et al.  The Pan-ErbB Negative Regulator Lrig1 Is an Intestinal Stem Cell Marker that Functions as a Tumor Suppressor , 2012, Cell.

[27]  D. Morton,et al.  AIM1 and LINE-1 Epigenetic Aberrations In Tumor and Serum Relate to Melanoma Progression and Disease Outcome , 2012, The Journal of investigative dermatology.

[28]  Samuel H. Payne,et al.  Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study , 2011, PloS one.

[29]  Nandini A. Sahasrabuddhe,et al.  A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry. , 2011, Genome research.

[30]  F. Bedford,et al.  The activation of ezrin–radixin–moesin proteins is regulated by netrin-1 through Src kinase and RhoA/Rho kinase activities and mediates netrin-1–induced axon outgrowth , 2011, Molecular biology of the cell.

[31]  Ashutosh Kumar Singh,et al.  A Systematic Analysis of Eluted Fraction of Plasma Post Immunoaffinity Depletion: Implications in Biomarker Discovery , 2011, PloS one.

[32]  Natalie I. Tasman,et al.  iProphet: Multi-level Integrative Analysis of Shotgun Proteomic Data Improves Peptide and Protein Identification Rates and Error Estimates* , 2011, Molecular & Cellular Proteomics.

[33]  James C. Wright,et al.  Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome. , 2011, Genome research.

[34]  Cathy H. Wu,et al.  The Human Proteome Project: Current State and Future Direction , 2011, Molecular & Cellular Proteomics.

[35]  K. Kinzler,et al.  Mutant proteins as cancer-specific biomarkers , 2011, Proceedings of the National Academy of Sciences.

[36]  Debasis Dash,et al.  Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry. , 2011, Molecular & cellular proteomics : MCP.

[37]  Knut Reinert,et al.  OpenMS and TOPP: open source software for LC-MS data analysis. , 2011, Methods in molecular biology.

[38]  D. MacArthur,et al.  Loss-of-function variants in the genomes of healthy humans. , 2010, Human molecular genetics.

[39]  V. Bafna,et al.  Proteogenomics to discover the full coding content of genomes: a computational perspective. , 2010, Journal of proteomics.

[40]  P. Pevzner,et al.  The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search* , 2010, Molecular & Cellular Proteomics.

[41]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[42]  Dexter T. Duncan,et al.  CanProVar: a human cancer proteome variation database , 2010, Human mutation.

[43]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[44]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[45]  Michael J MacCoss,et al.  Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations. , 2008, Genome research.

[46]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[47]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[48]  N. Edwards,et al.  Novel peptide identification from tandem mass spectra using ESTs and sequence database compression , 2007, Molecular systems biology.

[49]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[50]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[51]  Nichole L. King,et al.  Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry , 2004, Genome Biology.

[52]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[53]  M. Stratton,et al.  The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website , 2004, British Journal of Cancer.

[54]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[55]  R. Weichselbaum,et al.  Regulation of DNA damage-induced apoptosis by the c-Abl tyrosine kinase. , 1997, Proceedings of the National Academy of Sciences of the United States of America.