Detecting selection using a single genome sequence of M. tuberculosis and P. falciparum

Selective pressures on proteins are usually measured by comparing nucleotide sequences. Here we introduce a method to detect selection on the basis of a single genome sequence. We catalogue the relative strength of selection on each gene in the entire genomes of Mycobacterium tuberculosis and Plasmodium falciparum. Our analysis confirms that most antigens are under strong selection for amino-acid substitutions, particularly the PE/PPE family of putative surface proteins in M. tuberculosis and the EMP1 family of cytoadhering surface proteins in P. falciparum. We also identify many uncharacterized proteins that are under strong selection in each pathogen. We provide a genome-wide analysis of natural selection acting on different stages of an organism's life cycle: genes expressed in the ring stage of P. falciparum are under stronger positive selection than those expressed in other stages of the parasite's life cycle. Our method of estimating selective pressures requires far fewer data than comparative sequence analysis, and it measures selection across an entire genome; the method can readily be applied to a large range of sequenced organisms.

[1]  Kevin Marsh,et al.  Parasite antigens on the infected red cell surface are targets for naturally acquired immunity to malaria , 1998, Nature Medicine.

[2]  Joseph D. Smith,et al.  Widespread functional specialization of Plasmodium falciparum erythrocyte membrane protein 1 family members to bind CD36 analysed across a parasite genome , 2003, Molecular microbiology.

[3]  D. Petrov,et al.  Patterns of nucleotide substitution in Drosophila and mammalian genomes. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Z. Yang,et al.  Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. , 2001, Molecular biology and evolution.

[5]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[6]  R. Hudson,et al.  A test of neutral molecular evolution based on nucleotide data. , 1987, Genetics.

[7]  T. Theander,et al.  Sub-grouping of Plasmodium falciparum 3D7 var genes based on sequence analysis of coding and non-coding regions , 2003, Malaria Journal.

[8]  Maria Anisimova,et al.  The accuracy and power of likelihood ratio tests to detect positive selection at amino acid sites , 2001 .

[9]  Wen-Hsiung Li,et al.  An evolutionary perspective on synonymous codon usage in unicellular organisms , 1986, Journal of Molecular Evolution.

[10]  S. Salzberg,et al.  Whole-Genome Comparison of Mycobacterium tuberculosis Clinical and Laboratory Strains , 2002, Journal of bacteriology.

[11]  Joseph Felsenstein,et al.  A likelihood approach to character weighting and what it tells us about parsimony and compatibility , 1981 .

[12]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[13]  城所 良明,et al.  The Salk Institute for Biological Studies(話題) , 1975 .

[14]  Jonathan Dushoff,et al.  Codon bias and frequency-dependent selection on the hemagglutinin epitopes of influenza A virus , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  E. Rubin,et al.  Genes required for mycobacterial growth defined by high density mutagenesis , 2003, Molecular microbiology.

[16]  A. E. Hirsh,et al.  Adjusting for selection on synonymous sites in estimates of evolutionary distance. , 2005, Molecular biology and evolution.

[17]  D. P. Wall,et al.  Detecting putative orthologs , 2003, Bioinform..

[18]  S. Reed,et al.  T Cell Expression Cloning of a Mycobacterium tuberculosis Gene Encoding a Protective Antigen Associated with the Early Control of Infection1 , 2000, The Journal of Immunology.

[19]  B. Barrell,et al.  Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence , 1998, Nature.

[20]  M. Huynen,et al.  Neutral evolution of mutational robustness. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[21]  C. Laurent‐Winter,et al.  Sex-specific and blood meal-induced proteins of Anopheles gambiae midguts: analysis by two-dimensional gel electrophoresis , 2003, Malaria Journal.

[22]  Z. Yang,et al.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. , 2000, Molecular biology and evolution.

[23]  G. Bernardi,et al.  The human genome: organization and evolutionary history. , 1995, Annual review of genetics.

[24]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[25]  X. Su,et al.  The large diverse gene family var encodes proteins involved in cytoadherence and antigenic variation of plasmodium falciparum-infected erythrocytes , 1995, Cell.

[26]  S. Gould The Structure of Evolutionary Theory , 2002 .

[27]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[28]  F. Ayala,et al.  Genetic polymorphism and natural selection in the malaria parasite Plasmodium falciparum. , 1998, Genetics.

[29]  J. Schug,et al.  The Plasmodium genome database , 2002, Nature.

[30]  N. Federspiel,et al.  Granuloma-specific expression of Mycobacterium virulence proteins from the glycine-rich PE-PGRS family. , 2000, Science.

[31]  N. Sueoka,et al.  Asymmetric directional mutation pressures in bacteria , 2002, Genome Biology.

[32]  M. Eisen,et al.  Why PLoS Became a Publisher , 2003, PLoS biology.

[33]  David L. Tabb,et al.  A proteomic view of the Plasmodium falciparum life cycle , 2002, Nature.

[34]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[35]  Mark Gerstein,et al.  Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. , 2003, Nucleic acids research.

[36]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[37]  Patricia De la Vega,et al.  Discovery of Gene Function by Expression Profiling of the Malaria Parasite Life Cycle , 2003, Science.