Significant similarity and dissimilarity in homologous proteins.

Common practice emphasizes significant sequence similarities between different members of protein families. These similarities presumably reflect on evolutionary conservation of structurally and functionally essential residues. The nonconserved regions, on the other hand, may be either selectively neutral or differentiated. We propose several distributional sequence statistics (e.g., clustering of charged residues, compositional biases, and repetitive patterns) as indicators of differentiation events. These ideas are illustrated with various examples, including comparisons among G protein-coupled receptors, herpesvirus proteins, and GTPase-activating proteins.

[1]  Catherine Macken,et al.  Some statistical problems in the assessment of inhomogeneities of DNA sequence data , 1991 .

[2]  P. Hargrave,et al.  Molecular biology of the visual pigments , 1986, Vision Research.

[3]  M. Kimura,et al.  The neutral theory of molecular evolution. , 1983, Scientific American.

[4]  William D. Richardson,et al.  A short amino acid sequence able to specify nuclear location , 1984, Cell.

[5]  A J Davison,et al.  The complete DNA sequence of varicella-zoster virus. , 1986, The Journal of general virology.

[6]  Samuel Karlin,et al.  Comparative statistics for DNA and protein sequences: multiple sequence analysis , 1985 .

[7]  J. Parsons Closing the GAP in a signal transduction pathway. , 1990, Trends in genetics : TIG.

[8]  VOLKER BRENDEL,et al.  Too many leucine zippers? , 1989, Nature.

[9]  L. Orgel,et al.  Biochemical Evolution , 1971, Nature.

[10]  M. Wigler,et al.  Genetic analysis of mammalian GAP expressed in yeast , 1989, Cell.

[11]  M. Nei,et al.  Positive Darwinian selection promotes charge profile diversity in the antigen-binding cleft of class I major-histocompatibility-complex molecules. , 1990, Molecular biology and evolution.

[12]  Margaret Robertson,et al.  The neurofibromatosis type 1 gene encodes a protein related to GAP , 1990, Cell.

[13]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[14]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[15]  Rupert G. Miller Simultaneous Statistical Inference , 1966 .

[16]  Jun Ma,et al.  Deletion analysis of GAL4 defines two transcriptional activating segments , 1987, Cell.

[17]  J. Nathans,et al.  Molecular biology of visual pigments. , 1987, Annual review of neuroscience.

[18]  S Karlin,et al.  A method to identify distinctive charge configurations in protein sequences, with application to human herpesvirus polypeptides. , 1989, Journal of molecular biology.

[19]  J. Gillespie,et al.  RATES OF MOLECULAR EVOLUTION , 1986 .

[20]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[21]  S Karlin,et al.  Very long charge runs in systemic lupus erythematosus-associated autoantigens. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[22]  S. Karlin,et al.  Identification of significant sequence patterns in proteins. , 1990, Methods in enzymology.

[23]  S Karlin,et al.  Comparative statistics for DNA and protein sequences: single sequence analysis. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[24]  S Karlin,et al.  Association of charge clusters with functional domains of cellular transcription factors. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[25]  K. Matsumoto,et al.  IRA1, an inhibitory regulator of the RAS-cyclic AMP pathway in Saccharomyces cerevisiae , 1989, Molecular and cellular biology.

[26]  B. Barrell,et al.  Herpesviruses: a study of parts. , 1990, Trends in genetics : TIG.

[27]  Samuel Karlin,et al.  Algorithms for identifying local molecular sequence features , 1988, Comput. Appl. Biosci..

[28]  S Karlin,et al.  Charge configurations in viral proteins. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[29]  T. Jackson Cell surface receptors for nucleosides, nucleotides, amino acids and amine neurotransmitters. , 1990, Current opinion in cell biology.

[30]  S Karlin,et al.  An efficient algorithm for identifying matches with errors in multiple long molecular sequences. , 1991, Journal of molecular biology.

[31]  M. Goodman,et al.  Decoding the pattern of protein evolution. , 1981, Progress in biophysics and molecular biology.

[32]  T. Sakurai,et al.  Cloning of a cDNA encoding a non-isopeptide-selective subtype of the endothelin receptor , 1990, Nature.

[33]  B. Barrell,et al.  Analysis of the protein-coding content of the sequence of human cytomegalovirus strain AD169. , 1990, Current topics in microbiology and immunology.

[34]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[35]  B. Barrell,et al.  Human cytomegalovirus encodes three G protein-coupled receptor homologues , 1990, Nature.

[36]  K. Lynch,et al.  RTA, a candidate G protein-coupled receptor: cloning, sequencing, and tissue distribution. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Gunnar von Heijne,et al.  Net N-C charge imbalance may be important for signal sequence function in bacteria , 1986 .

[38]  P. L. Deininger,et al.  DNA sequence and expression of the B95-8 Epstein—Barr virus genome , 1984, Nature.

[39]  W. Taylor,et al.  Structural features of ribonucleotide reductase , 1986, Proteins.

[40]  G von Heijne,et al.  Net N-C charge imbalance may be important for signal sequence function in bacteria. , 1986, Journal of molecular biology.

[41]  S F Altschul,et al.  Statistical methods and insights for protein and DNA sequences. , 1991, Annual review of biophysics and biophysical chemistry.

[42]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[43]  S. McKnight,et al.  The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. , 1988, Science.

[44]  S. Nakanishi,et al.  Cloning and expression of a cDNA encoding an endothelin receptor , 1990, Nature.

[45]  Richard Earl Dickerson,et al.  Hemoglobin : structure, function, evolution, and pathology , 1983 .