PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames.

DNA translation frames can be disrupted for several reasons, including: (i) errors in sequence determination; (ii) RNA processing, such as intron removal and guide RNA editing; (iii) less commonly, polymerase frameshifting during transcription or ribosomal frameshifting during translation. Frameshifts frequently confound computational activities involving homologous sequences, such as database searches and inferences on structure, function or phylogeny made from multiple alignments. A dynamic alignment algorithm is reported here which compares a protein profile (a residue scoring matrix for one or more aligned sequences) against the three translation frames of a DNA strand, allowing frameshifting. The algorithm has been incorporated into a new package, WiseTools, for comparison of biological sequences. A protein profile can be compared against either a DNA sequence or a protein sequence. The program PairWise may be used interactively for alignment of any two sequence inputs. SearchWise can perform combinations of searches through DNA or protein databases by a protein profile or DNA sequence. Routine application of the programs has revealed a set of database entries with frameshifts caused by errors in sequence determination.

[1]  T J Gibson,et al.  PH domain: the first anniversary. , 1994, Trends in biochemical sciences.

[2]  Bonny Bryan,et al.  Nucleic Acids Research Nucleic Acids Research , 2022 .

[3]  R. Durbin,et al.  2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans , 1994, Nature.

[4]  K. Hagino-Yamagishi,et al.  [Oncogene]. , 2019, Gan to kagaku ryoho. Cancer & chemotherapy.

[5]  B. Séraphin Sm and Sm‐like proteins belong to a large family: identification of proteins of the U6 as well as the U1, U2, U4 and U5 snRNPs. , 1995, The EMBO journal.

[6]  Rodger Staden,et al.  Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes , 1984, Nucleic Acids Res..

[7]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank: current status. , 1994, Nucleic acids research.

[8]  R. Staden Searching for patterns in protein and nucleic acid sequences. , 1990, Methods in enzymology.

[9]  T. Gibson,et al.  Structure of the dsRNA binding domain of E. coli RNase III. , 1995, The EMBO journal.

[10]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[11]  E. Snyder,et al.  Identification of protein coding regions in genomic DNA. , 1995, Journal of molecular biology.

[12]  Amos Bairoch,et al.  The PROSITE database, its status in 1995 , 1996, Nucleic Acids Res..

[13]  Steven E. Brenner,et al.  Proceedings Of The Third International Conference On Intelligent Systems For Molecular Biology , 1995 .

[14]  T. Gibson,et al.  The PHD finger: implications for chromatin-mediated transcriptional regulation. , 1995, Trends in biochemical sciences.

[15]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[16]  W R Pearson,et al.  Dynamic programming algorithms for biological sequence comparison. , 1992, Methods in enzymology.

[17]  J. D. Thompson,et al.  Evidence for a protein domain superfamily shared by the cyclins, TFIIB and RB/p107. , 1994, Nucleic acids research.

[18]  T. Lints,et al.  The hematopoietically expressed vav proto-oncogene shares homology with the dbl GDP-GTP exchange factor, the bcr gene and a yeast gene (CDC24) involved in cytoskeletal organization. , 1992, Oncogene.

[19]  D. States Molecular sequence accuracy: analysing imperfect data. , 1992, Trends in genetics : TIG.

[20]  S. Beck Accuracy of DNA sequencing: should the sequence quality be monitored? , 1993, DNA sequence : the journal of DNA sequencing and mapping.

[21]  S A Benner,et al.  Amino acid substitution during functionally constrained divergent evolution of protein sequences. , 1994, Protein engineering.

[22]  R J Roberts,et al.  Finding errors in DNA sequences. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[24]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[25]  L. Silver,et al.  Sperm-Egg Binding Protein or Proto-Oncogene? , 1996, Science.

[26]  M. Ishiura,et al.  The starfish egg mRNA responsible for meiosis reinitiation encodes cyclin. , 1990, Developmental biology.

[27]  E. Birney,et al.  Analysis of the RNA-recognition motif and RS and RGG domains: conservation in metazoan pre-mRNA splicing factors. , 1993, Nucleic acids research.

[28]  T. Gibson,et al.  The SANT domain: a putative DNA-binding domain in the SWI-SNF and ADA complexes, the transcriptional co-repressor N-CoR and TFIIIB. , 1996, Trends in biochemical sciences.

[29]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[30]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[31]  J. Claverie,et al.  Detecting frame shifts by amino acid sequence comparison. , 1993, Journal of molecular biology.

[32]  G. Fichant,et al.  A frameshift error detection algorithm for DNA sequencing projects. , 1995, Nucleic acids research.

[33]  Non-muscle and smooth muscle myosin light chain kinases: no end in sight. , 1993, DNA sequence : the journal of DNA sequencing and mapping.

[34]  M. Barbacid,et al.  vav, a novel human oncogene derived from a locus ubiquitously expressed in hematopoietic cells. , 1989, The EMBO journal.

[35]  D J States,et al.  Molecular sequence accuracy and the analysis of protein coding regions. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[36]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[37]  Amos Bairoch,et al.  Proto-vav and gene expression , 1992, Nature.

[38]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[39]  M Gribskov,et al.  Sigma factors from E. coli, B. subtilis, phage SP01, and phage T4 are homologous proteins. , 1986, Nucleic acids research.

[40]  T. Gibson,et al.  Applying motif and profile searches. , 1996, Methods in enzymology.

[41]  J. Devereux,et al.  A comprehensive set of sequence analysis programs for the VAX , 1984, Nucleic Acids Res..

[42]  A. Stewart,et al.  The chromo shadow domain, a second chromo domain in heterochromatin-binding protein 1, HP1. , 1995, Nucleic acids research.

[43]  P. Bork,et al.  Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli , 1996, Current Biology.

[44]  M. Waterman,et al.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.