Computational Analysis and Experimental Validation of Gene Predictions in Toxoplasma gondii

Background Toxoplasma gondii is an obligate intracellular protozoan that infects 20 to 90% of the population. It can cause both acute and chronic infections, many of which are asymptomatic, and, in immunocompromized hosts, can cause fatal infection due to reactivation from an asymptomatic chronic infection. An essential step towards understanding molecular mechanisms controlling transitions between the various life stages and identifying candidate drug targets is to accurately characterize the T. gondii proteome. Methodology/Principal Findings We have explored the proteome of T. gondii tachyzoites with high throughput proteomics experiments and by comparison to publicly available cDNA sequence data. Mass spectrometry analysis validated 2,477 gene coding regions with 6,438 possible alternative gene predictions; approximately one third of the T. gondii proteome. The proteomics survey identified 609 proteins that are unique to Toxoplasma as compared to any known species including other Apicomplexan. Computational analysis identified 787 cases of possible gene duplication events and located at least 6,089 gene coding regions. Commonly used gene prediction algorithms produce very disparate sets of protein sequences, with pairwise overlaps ranging from 1.4% to 12%. Through this experimental and computational exercise we benchmarked gene prediction methods and observed false negative rates of 31 to 43%. Conclusions/Significance This study not only provides the largest proteomics exploration of the T. gondii proteome, but illustrates how high throughput proteomics experiments can elucidate correct gene structures in genomes.

[1]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[2]  Jacob D. Jaffe,et al.  Proteogenomic mapping as a complementary method to perform genome annotation , 2004, Proteomics.

[3]  R. Cole,et al.  The Opportunistic Pathogen Toxoplasma gondii Deploys a Diverse Legion of Invasion and Survival Proteins* , 2005, Journal of Biological Chemistry.

[4]  Kami Kim,et al.  Toxoplasma gondii: the model apicomplexan. , 2004, International journal for parasitology.

[5]  J. Yates,et al.  Determining the protein repertoire of Cryptosporidium parvum sporozoites , 2008, Proteomics.

[6]  T. Stevens,et al.  Do more complex organisms have a greater proportion of membrane proteins in their genomes? , 2000, Proteins.

[7]  T. Navin,et al.  Toxoplasma gondii infection in the United States: seroprevalence and risk factors. , 2001, American journal of epidemiology.

[8]  V. Carruthers Armed and dangerous: Toxoplasma gondii uses an arsenal of secretory proteins to infect host cells. , 1999, Parasitology international.

[9]  Peter J Bradley,et al.  Proteomic Analysis of Rhoptry Organelles Reveals Many Novel Constituents for Host-Parasite Interactions in Toxoplasma gondii* , 2005, Journal of Biological Chemistry.

[10]  Steven Salzberg,et al.  TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders , 2004, Bioinform..

[11]  John R Yates,et al.  The proteome of Toxoplasma gondii: integration with the genome provides novel insights into gene expression and annotation , 2008, Genome Biology.

[12]  Rong Wang,et al.  Mass spectrometry of the M. smegmatis proteome: protein expression levels correlate with function, operons, and codon bias. , 2005, Genome research.

[13]  E. Birney,et al.  EGASP: the human ENCODE Genome Annotation Assessment Project , 2006, Genome Biology.

[14]  Samuel Karlin,et al.  Genomics: Annotation of the Drosophila genome , 2001, Nature.

[15]  D. Roos,et al.  Transport and trafficking: Toxoplasma as a model for Plasmodium. , 1999, Novartis Foundation symposium.

[16]  Haiming Wang,et al.  ToxoDB: an integrated Toxoplasma gondii database resource , 2007, Nucleic Acids Res..

[17]  Anders Krogh,et al.  Large-scale prokaryotic gene prediction and comparison to genome annotation , 2005, Bioinform..

[18]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[19]  Ian Korf,et al.  Integrating genomic homology into gene structure prediction , 2001, ISMB.

[20]  A. van Dorsselaer,et al.  Proteomics and Glycomics Analyses of N-Glycosylated Structures Involved in Toxoplasma gondii-Host Cell Interactions*S , 2008, Molecular & Cellular Proteomics.

[21]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[22]  Fangli Lu,et al.  cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome , 2007, BMC Genomics.

[23]  S. Marion,et al.  Outbreak of toxoplasmosis associated with municipal drinking water , 1997, The Lancet.

[24]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[25]  Krystyna A. Kelly,et al.  Common inheritance of chromosome Ia associated with clonal expansion of Toxoplasma gondii. , 2006, Genome research.

[26]  J. K. Frenkel,et al.  Toxoplasmosis in Panama: a 10-year study. , 1988, The American journal of tropical medicine and hygiene.

[27]  Peter R. Jungblut,et al.  Proteomics Reveals Open Reading Frames inMycobacterium tuberculosis H37Rv Not Predicted by Genomics , 2001, Infection and Immunity.

[28]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[29]  S. Howell,et al.  Proteomic Analysis of Cleavage Events Reveals a Dynamic Two-step Mechanism for Proteolysis of a Key Parasite Adhesive Complex* , 2004, Molecular & Cellular Proteomics.

[30]  Aaron J Mackey,et al.  The transcriptome of Toxoplasma gondii , 2005, BMC Biology.

[31]  J. Ajioka,et al.  Toxoplasma : molecular and cellular biology , 2007 .

[32]  G. Weinstock,et al.  Creating a honey bee consensus gene set , 2007, Genome Biology.