Genomics and Proteomics

Nucleotide sequences contain hidden information about the forces for conservation and variation that shaped their evolutionary history. To glean sequences for hidden information motivates the study of similarities in sequence among orthologous and paralogous coding sequences, and also gives impetus for improved methods of phylogenetic estimation and hypothesis testing. Variation within populations is also evidential for evolutionary history. Within coding sequences, different patterns of variation are often observed between nonsynonymous nucleotide substitutions, which cause amino acid replacements, and synonymous nucleotide substitutions, which do not. For some coding sequences these differences are consistent with an evolutionary scenario featuring greater functional constraints on amino acid sequences than on nucleotide sequences. We have developed a sampling theory of selection and random genetic drift for interpreting the numbers of wildtype and variant nucleotides found among the polymorphic sites present in sequences of multiple alleles of a gene. This sampling theory has been used to interpret the patterns of intrapopulation polymorphism of 28 genes in Escherichia coli and Salmonella enterica, each gene exhibiting greater than 50 polymorphic sites among the alleles examined. Many of these genes have an excess of singleton amino acid Genomics and Proteomics, edited by Sándor Suhai. Kluwer Academic / Plenum Publishers, New York, 2000. 37 38 D. L. Hartl et al. polymorphisms, relative to the number of singleton synonymous polymorphisms. (A singleton polymorphism is one in which the sample is monomorphic except for a single variant.) In 22/28 genes, there is a greater proportion of singleton nonsynonymous polymorphisms than the proportion of singleton synonymous polymorphisms, and in 8 genes this excess is statistically significant. This pattern is consistent with a model in which most amino acid polymorphisms are slightly deleterious and hence present in samples at lower than expected frequencies. Furthermore, the sampling distribution of polymorphic synonymous nucleotide sites implies selection for optimal codon usage and enables estimation of the magnitude of the selection coefficients.

[1]  Roland Eils,et al.  Separate and variably shaped chromosome arm domains are disclosed by chromosome arm painting in human cell nuclei , 1998, Chromosome Research.

[2]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[3]  N. Guex,et al.  SWISS‐MODEL and the Swiss‐Pdb Viewer: An environment for comparative protein modeling , 1997, Electrophoresis.

[4]  G Vriend,et al.  WHAT IF: a molecular modeling and drug design program. , 1990, Journal of molecular graphics.

[5]  Wim G. J. Hol,et al.  PROTEIN CRYSTALLOGRAPHY AND COMPUTER-GRAPHICS TOWARD RATIONAL DRUG DESIGN , 1986 .

[6]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[7]  Gregory D. Schuler,et al.  ESTablishing a human transcript map , 1995, Nature Genetics.

[8]  L Shapiro,et al.  The Argonne Structural Genomics Workshop: Lamaze class for the birth of a new science. , 1998, Structure.

[9]  M Qi,et al.  Constitutive skipping of alternatively spliced exon 10 in the ATP7A gene abolishes Golgi localization of the menkes protein and produces the occipital horn syndrome. , 1998, Human molecular genetics.

[10]  Sung-Hou Kim Shining a light on structural genomics , 1998, Nature Structural Biology.

[11]  C J Epstein Seven momentous years. , 1993, American journal of human genetics.

[12]  K. Chou,et al.  Prediction and classification of domain structural classes , 1998, Proteins.

[13]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[14]  Arnold Munnich,et al.  SHOX mutations in dyschondrosteosis (Leri-Weill syndrome) , 1998, Nature Genetics.

[15]  C Cremer,et al.  Role of chromosome territories in the functional compartmentalization of the cell nucleus. , 1993, Cold Spring Harbor symposia on quantitative biology.

[16]  R D Appel,et al.  A new generation of information retrieval tools for biologists: the example of the ExPASy WWW server. , 1994, Trends in biochemical sciences.

[17]  P Scambler,et al.  Frasier syndrome is caused by defective alternative splicing of WT1 leading to an altered ratio of WT1 +/-KTS splice isoforms. , 1998, Human molecular genetics.

[18]  A Benner,et al.  Active and inactive genes localize preferentially in the periphery of chromosome territories , 1996, The Journal of cell biology.

[19]  Hans Lehrach,et al.  Automated array technologies for gene expression profiling , 1997 .

[20]  David E. Gloriam,et al.  GPCRdb: an information system for G protein-coupled receptors , 2015, Nucleic Acids Res..

[21]  Kolakowski Lf GCRDB: A G-PROTEIN-COUPLED RECEPTOR DATABASE , 1994 .

[22]  D. Fischer,et al.  Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[23]  M. Gelfand,et al.  Frequent alternative splicing of human genes. , 1999, Genome research.

[24]  R Eils,et al.  The 3D positioning of ANT2 and ANT3 genes within female X chromosome territories correlates with gene activity. , 1999, Experimental cell research.

[25]  J Hoflack,et al.  Three-dimensional models of neurotransmitter G-binding protein-coupled receptors. , 1991, Molecular pharmacology.

[26]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[27]  T. Ideker,et al.  Mining SNPs from EST databases. , 1999, Genome research.

[28]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.

[29]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[30]  A. Skerra,et al.  One-step affinity purification of bacterially produced proteins by means of the "Strep tag" and immobilized recombinant core streptavidin. , 1994, Journal of chromatography. A.

[31]  R Eils,et al.  Evidence against a looped structure of the inactive human X-chromosome territory. , 1998, Experimental cell research.

[32]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[33]  Ernst H. K. Stelzer,et al.  Structure and dynamics of human interphase chromosome territories in vivo , 1998, Human Genetics.

[34]  T. Cremer,et al.  Quantitative motion analysis of subchromosomal foci in living cells using four-dimensional microscopy. , 1999, Biophysical journal.

[35]  G Vriend,et al.  A low resolution model for the interaction of G proteins with G protein-coupled receptors. , 1999, Protein engineering.

[36]  Carol A. Dahl,et al.  New opportunities for uncovering the molecular basis of cancer , 1997, Nature Genetics.

[37]  G Vriend,et al.  Receptors coupling to G proteins: Is there a signal behind the sequence? , 2000, Proteins.

[38]  V S Lamzin,et al.  Automated refinement for protein crystallography. , 1997, Methods in enzymology.

[39]  G Vriend,et al.  Modeling of transmembrane seven helix bundles. , 1993, Protein engineering.

[40]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[41]  Fionn Murtagh,et al.  Image Processing and Data Analysis - The Multiscale Approach , 1998 .

[42]  Ashwin Srinivasan,et al.  Pharmacophore Discovery Using the Inductive Logic Programming System PROGOL , 1998, Machine Learning.

[43]  P. Lichter,et al.  Identification of an interchromosomal compartment by polymerization of nuclear-targeted vimentin. , 1998, Journal of cell science.

[44]  B. Roth,et al.  A single point mutation (Phe340-->Leu340) of a conserved phenylalanine abolishes 4-[125I]iodo-(2,5-dimethoxy)phenylisopropylamine and [3H]mesulergine but not [3H]ketanserin binding to 5-hydroxytryptamine2 receptors. , 1993, Molecular pharmacology.

[45]  T Gaasterland,et al.  Structural genomics taking shape. , 1998, Trends in genetics : TIG.

[46]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[47]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[48]  Roderic Guigó,et al.  Gff2ps: Visualizing Genomic Annotations , 2000, Bioinform..

[49]  Christian Münkel,et al.  Chromosome structure predicted by a polymer model , 1998 .

[50]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[51]  J. Ellison,et al.  PHOG, a candidate gene for involvement in the short stature of Turner syndrome. , 1997, Human molecular genetics.

[52]  David C. Jones,et al.  Progress in protein structure prediction. , 1997, Current opinion in structural biology.

[53]  P Bork,et al.  Homology-based fold predictions for Mycoplasma genitalium proteins. , 1998, Journal of molecular biology.

[54]  A. IJzerman,et al.  TinyGRAP database: a bioinformatics tool to mine G-protein-coupled receptor mutant data. , 1999, Trends in pharmacological sciences.

[55]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 1999, Nucleic Acids Res..

[56]  T. Gudermann,et al.  Receptors and G proteins as primary components of transmembrane signal transduction , 1995, Journal of Molecular Medicine.

[57]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[58]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[59]  M Nirenberg,et al.  Cloning and characterization of four murine homeobox genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[60]  P. Lichter,et al.  Nuclear RNA accumulations contain released transcripts and exhibit specific distributions with respect to Sm antigen foci. , 1997, DNA and cell biology.

[61]  Osamu Hori,et al.  A Novel Presenilin‐2 Splice Variant in Human Alzheimer's Disease Brain Tissue , 1999, Journal of neurochemistry.

[62]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[63]  R H Sarma,et al.  Structural Biology: The State of the Art. , 1994, Journal of biomolecular structure & dynamics.

[64]  P. Bork,et al.  Secreted Fringe-like Signaling Molecules May Be Glycosyltransferases , 1997, Cell.

[65]  R Eils,et al.  Three-dimensional reconstruction of painted human interphase chromosomes: active and inactive X chromosome territories have similar volumes but differ in shape and surface structure , 1996, The Journal of cell biology.

[66]  J Schultz,et al.  SMART, a simple modular architecture research tool: identification of signaling domains. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[67]  G Vriend,et al.  A common step for signal transduction in G protein-coupled receptors. , 1994, Trends in pharmacological sciences.

[68]  R Eils,et al.  Nuclear architecture and the induction of chromosomal aberrations. , 1996, Mutation research.

[69]  Robert Herzog,et al.  WWW2GCG, a web interface to the GCG biological sequences analysis software , 1996, Comput. Graph..

[70]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[71]  Shohei Koyama,et al.  Expression of epidermal growth factor receptor and CD44 splicing variants sharing exons 6 and 9 on gastric and esophageal carcinomas: a two-color flow-cytometric analysis , 1999, Journal of Cancer Research and Clinical Oncology.

[72]  S A Benner,et al.  Protein Structure Prediction , 1996, Science.

[73]  Michael N. Edmonson,et al.  Reliable identification of large numbers of candidate SNPs from public EST data , 1999, Nature Genetics.

[74]  P. Bork,et al.  Alternative splicing of human genes: more the rule than the exception? , 1999, Trends in genetics : TIG.

[75]  David J. States,et al.  Identification of protein coding regions by database similarity search , 1993, Nature Genetics.

[76]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[77]  D Eisenberg,et al.  A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. , 1997, Journal of molecular biology.

[78]  正木 茂夫,et al.  DNA Data Bank of Japan(DDBJ)利用初心者講習会印象記 , 1988 .

[79]  A G Murzin,et al.  Distant homology recognition using structural classification of proteins , 1997, Proteins.

[80]  V. McKusick Mendelian inheritance in man , 1971 .

[81]  O. Ptitsyn,et al.  Why do globular proteins fit the limited set of folding patterns? , 1987, Progress in biophysics and molecular biology.

[82]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[83]  K. Sanders,et al.  G Protein-coupled Receptors in Gastrointestinal Physiology Iv. Neural Regulation of Gastrointestinal Smooth Muscle Structural Features of G Protein-coupled Receptors * Fourth in a Series of Invited Articles on G Protein-coupled Receptors in Gastrointestinal Physiology , 2022 .

[84]  J. Baldwin,et al.  An alpha-carbon template for the transmembrane helices in the rhodopsin family of G-protein-coupled receptors. , 1997, Journal of molecular biology.

[85]  Sung-Hou Kim,et al.  Crystal structure of a small heat-shock protein , 1998, Nature.

[86]  R. Zhang,et al.  A new criterion to classify globular proteins based on their secondary structure contents , 1998, Bioinform..

[87]  C. Reynolds,et al.  A new approach to docking in the beta 2-adrenergic receptor that exploits the domain structure of G-protein-coupled receptors. , 1997, Journal of medicinal chemistry.

[88]  Peter J. Scambler,et al.  Mutation and deletion of the pseudoautosomal gene SHOX cause Leri-Weill dyschondrosteosis , 1998, Nature Genetics.

[89]  Peer Bork,et al.  Characterization of targeting domains by sequence analysis: glycogen-binding domains in protein phosphatases , 1998, Journal of Molecular Medicine.

[90]  Terri K. Attwood,et al.  PRINTS-S: the database formerly known as PRINTS , 2000, Nucleic Acids Res..

[91]  G. Schuler Pieces of the puzzle: expressed sequence tags and the catalog of human genes , 1997, Journal of Molecular Medicine.

[92]  Asako Saegusa,et al.  Japan's genome programme goes ahead, with protein analysis , 1998, Nature.

[93]  G. Schertler,et al.  Structure of rhodopsin. , 2007, Eye.

[94]  H Oschkinat,et al.  Automated assignment of multidimensional nuclear magnetic resonance spectra. , 1994, Methods in enzymology.

[95]  Jérôme Gracy,et al.  Automated protein sequence database classification. II. Delineation Of domain boundaries from sequence similarities , 1998, Bioinform..

[96]  Gebhard F. X. Schertler,et al.  Arrangement of rhodopsin transmembrane α-helices , 1997, Nature.

[97]  Craig M. Ogata,et al.  MAD phasing grows up , 1998, Nature Structural Biology.

[98]  Chris Sander,et al.  MView: a web-compatible database search or multiple alignment viewer , 1998, Bioinform..

[99]  Phillip A. Sharp,et al.  Split genes and RNA splicing , 1994, Cell.

[100]  G Vriend,et al.  The interaction of class B G protein-coupled receptors with their hormones. , 1998, Receptors & channels.

[101]  H. Bornfleth,et al.  Spatial distribution of GC- and AT-rich DNA sequences within human chromosome territories. , 2000, Experimental cell research.

[102]  N. Shen,et al.  Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis , 1999, Nature Genetics.

[103]  R Eils,et al.  Compartmentalization of interphase chromosomes observed in simulation and experiment. , 1999, Journal of molecular biology.

[104]  Janice I. Glasgow,et al.  Critical point analysis in protein density-map interpretation , 1996 .

[105]  T G Wolfsberg,et al.  A comparison of expressed sequence tags (ESTs) to human genomic sequences. , 1997, Nucleic acids research.

[106]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[107]  Maurizio Zuccotti,et al.  Full-term development of mice from enucleated oocytes injected with cumulus cell nuclei , 1998, Nature.

[108]  G M Shepherd,et al.  Potential ligand-binding residues in rat olfactory receptors identified by correlated mutation analysis. , 1995, Receptors & channels.

[109]  Patricia C Weber,et al.  [2] Overview of protein crystallization methods. , 1997, Methods in enzymology.

[110]  P. Deloukas,et al.  A Gene Map of the Human Genome , 1996, Science.

[111]  Earl E. Swartzlander,et al.  Introduction to Mathematical Techniques in Pattern Recognition , 1973 .

[112]  Michael Y. Galperin,et al.  Beyond complete genomes: from sequence to structure and function. , 1998, Current opinion in structural biology.

[113]  H Luecke,et al.  Proton transfer pathways in bacteriorhodopsin at 2.3 angstrom resolution. , 1998, Science.

[114]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[115]  J. Baldwin The probable arrangement of the helices in G protein‐coupled receptors. , 1993, The EMBO journal.

[116]  Cremer,et al.  High‐precision distance measurements and volume‐conserving segmentation of objects near and below the resolution limit in three‐dimensional confocal fluorescence microscopy , 1998 .

[117]  S H Kim,et al.  Crystal structures of eukaryotic translation initiation factor 5A from Methanococcus jannaschii at 1.8 A resolution. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[118]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[119]  Roderic Guigó,et al.  Computational Gene Identification: An Open Problem , 1997, Comput. Chem..

[120]  A. Sali 100,000 protein structures for the biologist , 1998, Nature Structural Biology.

[121]  Wayne A Hendrickson,et al.  [28] Phase determination from multiwavelength anomalous diffraction measurements. , 1997, Methods in enzymology.

[122]  U. Hobohm,et al.  A sequence property approach to searching protein databases. , 1995, Journal of molecular biology.

[123]  Janet M. Thornton,et al.  Protein domain superfolds and superfamilies , 1994 .

[124]  Peer Bork,et al.  Sequences and topology Deriving biological knowledge from genomic sequences , 1998 .

[125]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[126]  C Cremer,et al.  Chromatin structure and chromosome aberrations: modeling of damage induced by isotropic and localized irradiation. , 1998, Mutation research.

[127]  Kenneth A Jacobson,et al.  Molecular architecture of G protein‐coupled receptors , 1996, Drug development research.

[128]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[129]  I. Muchnik,et al.  Recognition of a protein fold in the context of the SCOP classification , 1999 .