SNPs on human chromosomes 21 and 22 -- analysis in terms of protein features and pseudogenes.

SNPs are useful for genome-wide mapping and the study of disease genes. Previous studies have focused on SNPs in specific genes or SNPs pooled from a variety of different sources. Here, a systematic approach to the analysis of SNPs in relation to various features on a genome-wide scale, with emphasis on protein features and pseudogenes, is presented. We have performed a comprehensive analysis of 39,408 SNPs on human chromosomes 21 and 22 from the SNP consortium (TSC) database, where SNPs are obtained by random sequencing using consistent and uniform methods. Our study indicates that the occurrence of SNPs is lowest in exons and higher in repeats, introns and pseudogenes. Moreover, in comparing genes and pseudogenes, we find that the SNP density is higher in pseudogenes and the ratio of nonsynonymous to synonymous changes is also much higher. These observations may be explained by the increased rate of SNP accumulation in pseudogenes, which presumably are not under selective pressure. We have also performed secondary structure prediction on all coding regions and found that there is no preferential distribution of SNPs in a -helices, b -sheets or coils. This could imply that protein structures, in general, can tolerate a wide degree of substitutions. Tables relating to our results are available from http://genecensus.org/pseudogene.

[1]  M. Luo,et al.  Structural Plasticity in Influenza Virus Protein NS2 (NEP)* , 2002, The Journal of Biological Chemistry.

[2]  Mark Gerstein,et al.  Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. , 2002, Genome research.

[3]  Richard A Goldstein,et al.  Why are proteins so robust to site mutations? , 2002, Journal of molecular biology.

[4]  P. Sun,et al.  Conformational plasticity revealed by the cocrystal structure of NKG2D and its class I MHC-like ligand ULBP3. , 2001, Immunity.

[5]  S. P. Fodor,et al.  Evolutionarily conserved sequences on human chromosome 21. , 2001, Genome research.

[6]  J. Stephens,et al.  Haplotype Variation and Linkage Disequilibrium in 313 Human Genes , 2001, Science.

[7]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[8]  C. Burge,et al.  Computational inference of homologous gene structures in the human genome. , 2001, Genome research.

[9]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[10]  A. Brookes HGBASE--a unified human SNP database. , 2001, Trends in genetics : TIG.

[11]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[12]  D. Chasman,et al.  Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. , 2001, Journal of molecular biology.

[13]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[14]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[15]  M. Gerstein,et al.  Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome. , 2001, Nucleic acids research.

[16]  Christopher J. Lee,et al.  Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences , 2000, Nature Genetics.

[17]  G. D. Wilson,et al.  An SNP map of human chromosome 22 , 2000, Nature.

[18]  S. Karlin,et al.  Predicted Highly Expressed Genes of Diverse Prokaryotic Genomes , 2000, Journal of bacteriology.

[19]  R S Judson,et al.  Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Shinsei Minoshima,et al.  Erratum: The DNA sequence of human chromosome 21: The chromosome 21 mapping and sequencing consortium (Nature (2000) 405 (311-319)) , 2000 .

[21]  J. Cheng,et al.  Complete genomic sequence of the human ABCA1 gene: analysis of the human and mouse ATP-binding cassette A promoter. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  H. Hamdi,et al.  Alu-mediated phylogenetic novelties in gene regulation and development. , 2000, Journal of molecular biology.

[23]  M. Hattori,et al.  The DNA sequence of human chromosome 21 , 2000, Nature.

[24]  P. Bork,et al.  Towards a structural basis of human non-synonymous single nucleotide polymorphisms. , 2000, Trends in genetics : TIG.

[25]  David A. Willoughby,et al.  An Alu Element from the K18 Gene Confers Position-independent Expression in Transgenic Mice* , 2000, The Journal of Biological Chemistry.

[26]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[27]  A. Smit Interspersed repeats and other mementos of transposable elements in mammalian genomes. , 1999, Current opinion in genetics & development.

[28]  N. Shen,et al.  Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis , 1999, Nature Genetics.

[29]  Michael N. Edmonson,et al.  Reliable identification of large numbers of candidate SNPs from public EST data , 1999, Nature Genetics.

[30]  E. Koonin,et al.  Site-selected Mutagenesis of a Conserved Nucleotide Binding HXGH Motif Located in the ATP Sulfurylase Domain of Human Bifunctional 3′-Phosphoadenosine 5′-Phosphosulfate Synthase* , 1999, The Journal of Biological Chemistry.

[31]  M. Gerstein How representative are the known structures of the proteins in a complete genome? A comprehensive structural census. , 1998, Folding & design.

[32]  M. Gerstein,et al.  Comparing genomes in terms of protein structure: surveys of a finite parts list. , 1998, FEMS microbiology reviews.

[33]  P. Kwok,et al.  Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. , 1998, Genome research.

[34]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[35]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[36]  B. Matthews,et al.  Protein structural plasticity exemplified by insertion and deletion mutants in T4 lysozyme , 1996, Protein science : a publication of the Protein Society.

[37]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[38]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[39]  E. Koonin Multidomain organization of eukaryotic guanine nucleotide exchange translation initiation factor eIF‐2B subunits revealed by analysis of conserved sequence motifs , 1995, Protein science : a publication of the Protein Society.

[40]  B. Matthews,et al.  Protein flexibility and adaptability seen in 25 crystal forms of T4 lysozyme. , 1995, Journal of molecular biology.

[41]  W. Driscoll,et al.  A P-loop related motif (GxxGxxK) highly conserved in sulfotransferases is required for binding the activated sulfate donor. , 1994, Biochemical and biophysical research communications.

[42]  S A Benner,et al.  Analysis of amino acid substitution during divergent evolution: the 400 by 400 dipeptide substitution matrix. , 1994, Biochemical and biophysical research communications.

[43]  D. Mindell Fundamentals of molecular evolution , 1991 .

[44]  Wen-Hsiung Li,et al.  Fundamentals of molecular evolution , 1990 .

[45]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[46]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[47]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[48]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[49]  E. Lander,et al.  Characterization of single-nucleotide polymorphisms in coding regions of human genes , 1999, Nature Genetics.

[50]  S. Karlin,et al.  Comparative DNA analysis across diverse genomes. , 1998, Annual review of genetics.

[51]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[52]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.