Pattern detection in biomolecules using synthesized random sequence

Abstract This paper presents a methodology which is able to: (1) synthesize a class of biomolecular sequences into a probabilistic pattern known as a random sequence for that class and (2) use the random sequence to search and detect subsequences pertaining to that class from a much longer sequence. The detection is achieved through an optimal matching of the random sequence against segments of the search sequence. Since the random sequence contains probabilistic characteristics of many sequences in the class, its comparison with search sequence segments is much more reliable than between two single sequences. The paper presents both the basic notion as well as an algorithm of the synthesis process. It also describes an experiment for detecting transfer RNA sequences embedded in a long DNA sequence derived from bovine mitochondrial genome. The successful detection is based on the optimal matching of the DNA sequence segments with the random sequence synthesized from 12 transfer RNA sequences.

[1]  Gunnar von Heijne Chapter 2 – The Collector's Dream: From Dayhoff to Data Banks , 1987 .

[2]  Michael G. Thomason,et al.  Dynamic Programming Inference of Markov Networks from Finite Sets of Sample Strings , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andrew K. C. Wong,et al.  Synthesis and Recognition of Sequences , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  R F Doolittle,et al.  Searching through sequence databases. , 1990, Methods in enzymology.

[5]  F. Sanger,et al.  Complete sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. , 1982, Journal of molecular biology.

[6]  S. Colowick,et al.  Methods in Enzymology , Vol , 1966 .

[7]  D. K. Y. Chiu,et al.  A multiple sequence comparison method , 1993 .

[8]  David K. Y. Chiu,et al.  A method for inferring probabilistic consensus structure with applications to molecular sequence data , 1993, Pattern Recognit..

[9]  Hermann Kaindl,et al.  Minimax Search Algorithms With and Without Aspiration Windows , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  D. Pribnow Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[11]  G. Schroth,et al.  Mapping Z-DNA in the human genome. Computer-aided mapping reveals a nonrandom distribution of potential Z-DNA-forming sequences in human genes. , 1992, The Journal of biological chemistry.

[12]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[13]  C. Lawrence,et al.  Use of homology domains in sequence similarity detection. , 1990, Methods in enzymology.

[14]  R. F. Smith,et al.  Automatic generation of primary sequence patterns from sets of related protein sequences. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Andrew K. C. Wong,et al.  An event-covering method for effective probabilistic inference , 1987, Pattern Recognit..

[16]  Patrick Henry Winston,et al.  Integrating AI with sequence analysis , 1993 .

[17]  Andrew K. C. Wong,et al.  Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Rodger Staden,et al.  Methods to define and locate patterns of motifs in sequences , 1988, Comput. Appl. Biosci..