The Proteogenomic Mapping Tool

BackgroundHigh-throughput mass spectrometry (MS) proteomics data is increasingly being used to complement traditional structural genome annotation methods. To keep pace with the high speed of experimental data generation and to aid in structural genome annotation, experimentally observed peptides need to be mapped back to their source genome location quickly and exactly. Previously, the tools to do this have been limited to custom scripts designed by individual research groups to analyze their own data, are generally not widely available, and do not scale well with large eukaryotic genomes.ResultsThe Proteogenomic Mapping Tool includes a Java implementation of the Aho-Corasick string searching algorithm which takes as input standardized file types and rapidly searches experimentally observed peptides against a given genome translated in all 6 reading frames for exact matches. The Java implementation allows the application to scale well with larger eukaryotic genomes while providing cross-platform functionality.ConclusionsThe Proteogenomic Mapping Tool provides a standalone application for mapping peptides back to their source genome on a number of operating system platforms with standard desktop computer hardware and executes very rapidly for a variety of datasets. Allowing the selection of different genetic codes for different organisms allows researchers to easily customize the tool to their own research interests and is recommended for anyone working to structurally annotate genomes using MS derived proteomics data.

[1]  F. McCarthy,et al.  Modeling a whole organ using proteomics: The avian bursa of Fabricius , 2006, Proteomics.

[2]  Yoginder S. Dandass,et al.  Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research , 2008, BMC Bioinformatics.

[3]  J. Garin,et al.  PepLine: a software pipeline for high-throughput direct mapping of tandem mass spectrometry data on genomic sequences. , 2008, Journal of proteome research.

[4]  Jacob D. Jaffe,et al.  Proteogenomic mapping as a complementary method to perform genome annotation , 2004, Proteomics.

[5]  Nan Wang,et al.  Gene Model Detection Using Mass Spectrometry , 2010, Proteome Bioinformatics.

[6]  Adrian R. Krainer,et al.  AT-AC Pre-mRNA Splicing Mechanisms and Conservation of Minor Introns in Voltage-Gated Ion Channel Genes , 1999, Molecular and Cellular Biology.

[7]  S. Burgess,et al.  Assessment of dietary amino acid scarcity on growth and blood plasma proteome status of broiler chickens. , 2005, Poultry science.

[8]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[9]  S. Salzberg,et al.  GeneSplicer: a new computational method for splice site prediction. , 2001, Nucleic acids research.

[10]  Samuel H. Payne,et al.  Discovery and revision of Arabidopsis genes by proteogenomics , 2008, Proceedings of the National Academy of Sciences.

[11]  Bindu Nanduri,et al.  Experimental annotation of channel catfish virus by probabilistic proteogenomic mapping , 2009, Proteomics.

[12]  Fanyu Meng,et al.  Whole genome searching with shotgun proteomic data: applications for genome annotation. , 2008, Journal of proteome research.

[13]  Axel Funk,et al.  Die GNU General Public License, Version 3 , 2007 .