PITDB: a database of translated genomic elements

Abstract PITDB is a freely available database of translated genomic elements (TGEs) that have been observed in PIT (proteomics informed by transcriptomics) experiments. In PIT, a sample is analyzed using both RNA-seq transcriptomics and proteomic mass spectrometry. Transcripts assembled from RNA-seq reads are used to create a library of sample-specific amino acid sequences against which the acquired mass spectra are searched, permitting detection of any TGE, not just those in canonical proteome databases. At the time of writing, PITDB contains over 74 000 distinct TGEs from four species, supported by more than 600 000 peptide spectrum matches. The database, accessible via http://pitdb.org, provides supporting evidence for each TGE, often from multiple experiments and an indication of the confidence in the TGE’s observation and its type, ranging from known protein (exact match to a UniProt protein sequence), through multiple types of protein variant including various splice isoforms, to a putative novel molecule. PITDB’s modern web interface allows TGEs to be viewed individually or by species or experiment, and downloaded for further analysis. PITDB is for bench scientists seeking to share their PIT results, for researchers investigating novel genome products in model organisms and for those wishing to construct proteomes for lesser studied species.

[1]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[2]  Kin-Fan Au,et al.  PacBio Sequencing and Its Applications , 2015, Genom. Proteom. Bioinform..

[3]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[4]  Juan Antonio Vizcaíno,et al.  A toolkit for the mzIdentML standard: the ProteoIDViewer, the mzidLibrary and the mzidValidator , 2013 .

[5]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[6]  Sheng Gu,et al.  Amino acid residue specific stable isotope labeling for quantitative proteomics. , 2002, Rapid communications in mass spectrometry : RCM.

[7]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[8]  Jonathan E. Allen,et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.

[9]  D. Matthews,et al.  De novo derivation of proteomes from transcriptomes for transcript and protein identification , 2012, Nature Methods.

[10]  A. Børresen-Dale,et al.  Identification of fusion genes in breast cancer by paired-end RNA-sequencing , 2011, Genome Biology.

[11]  R. Bernards,et al.  Identification of recurrent FGFR3 fusion genes in lung cancer through kinome‐centred RNA sequencing , 2013, The Journal of pathology.

[12]  Joseph A. Rothnagel,et al.  Emerging evidence for functional peptides encoded by short open reading frames , 2014, Nature Reviews Genetics.

[13]  Sanghyuk Lee,et al.  ChimerDB 2.0—a knowledgebase for fusion genes updated , 2009, Nucleic Acids Res..

[14]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[15]  Conrad Bessant,et al.  Galaxy Integrated Omics: Web-based Standards-Compliant Workflows for Proteomics Informed by Transcriptomics* , 2015, Molecular & Cellular Proteomics.

[16]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[17]  Andrew H. Thompson,et al.  Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. , 2003, Analytical chemistry.

[18]  Jef D Boeke,et al.  Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. , 2006, Genome research.

[19]  E. Hsi,et al.  Onco-proteogenomics identifies urinary S100A9 and GRN as potential combinatorial biomarkers for early diagnosis of hepatocellular carcinoma , 2015, BBA clinical.

[20]  J. Rinn,et al.  Peptidomic discovery of short open reading frame-encoded peptides in human cells , 2012, Nature chemical biology.

[21]  B. Johansson,et al.  The impact of translocations and gene fusions on cancer causation , 2007, Nature Reviews Cancer.

[22]  I. Panagopoulos,et al.  Confirmation of the high frequency of the TMPRSS2/ERG fusion gene in prostate cancer. , 2006, Genes, chromosomes & cancer.

[23]  Juan Antonio Vizcaíno,et al.  Tools (Viewer, Library and Validator) that Facilitate Use of the Peptide and Protein Identification Standard Format, Termed mzIdentML , 2013, Molecular & Cellular Proteomics.

[24]  Frédéric Chalmel,et al.  Forty-Four Novel Protein-Coding Loci Discovered Using a Proteomics Informed by Transcriptomics (PIT) Approach in Rat Male Germ Cells1 , 2014, Biology of reproduction.

[25]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[26]  F. Cross,et al.  Accurate quantitation of protein expression and site-specific phosphorylation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[27]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[28]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.