A 3D pattern matching algorithm for DNA sequences

MOTIVATION Biologists usually work with textual DNA sequences (succession of A, C, G and T). This representation allows biologists to study the syntax and other linguistic properties of DNA sequences. Nevertheless, such a linear coding offers only a local and a one-dimensional vision of the molecule. The 3D structure of DNA is known to be very important in many essential biological mechanisms. By using 3D conformation models, one is able to construct a 3D trajectory of a naked DNA molecule. From the various studies that we performed, it turned out that two very different textual DNA sequences could have similar 3D structures. RESULTS In this article, we address a new research work on 3D pattern matching for DNA sequences. The aim of this work is to enhance conventional pattern matching analyses with 3D-augmented criteria. We have developed an algorithm, based on 3D trajectories, which compares angles formed by these trajectories and thus quantifies the difference between two 3D DNA sequences. This analysis performs from a global scale to al local one. AVAILABILITY Available on request from the authors.

[1]  H. Ingmer,et al.  H‐NS: a modulator of environmentally regulated gene expression , 1997, Molecular microbiology.

[2]  Eric Rivals,et al.  STAR: an algorithm to Search for Tandem Approximate Repeats , 2004, Bioinform..

[3]  Joan Hérisson,et al.  DNA in Virtuo visualization and exploration of 3D genomic structures , 2004, AFRIGRAPH '04.

[4]  R E Harrington,et al.  Curved DNA without A-A: experimental estimation of all 16 DNA wedge angles. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  Yves Bigot,et al.  Structural and transcriptional features of Bombus terrestris satellite DNA and their potential involvement in the differentiation process. , 2004, Genome.

[7]  Peter J. Price The Bipolar Righi-Leduc Effect , 1958, IBM J. Res. Dev..

[8]  A. Palleschi,et al.  A theoretical model of DNA curvature. , 1988, Biophysical chemistry.

[9]  F. Crick,et al.  Molecular structure of nucleic acids , 2004, JAMA.

[10]  Gaston H. Gonnet,et al.  A new approach to text searching , 1992, CACM.

[11]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[12]  Ren Zhang,et al.  The Z curve database: a graphic representation of genome sequences , 2003, Bioinform..

[13]  Y. Xia,et al.  Introduction to Magnetic Resonance , 2007 .

[14]  Isabelle Herlin,et al.  Curves matching using geodesic paths , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[15]  C R Calladine,et al.  The assessment of the geometry of dinucleotide steps in double-helical DNA; a new local calculation scheme. , 1995, Journal of molecular biology.

[16]  Alan Carrington,et al.  Introduction to Magnetic Resonance , 1967 .

[17]  A. Tormo,et al.  Sigma s-dependent promoters in Escherichia coli are located in DNA regions with intrinsic curvature. , 1993, Nucleic acids research.

[18]  Luiz A. Costa,et al.  Determining the similarity of deformable shapes , 1995, Vision Research.

[19]  Philip N. Klein,et al.  Alignment-Based Recognition of Shape Outlines , 2001, IWVF.

[20]  Joan Hérisson,et al.  Combining applications and remote databases view in a common SQL distributed genomic database , 2005, Data Sci. J..

[21]  V. de Lorenzo,et al.  Promoters responsive to DNA bending: a common theme in prokaryotic gene expression. , 1994, Microbiological reviews.

[22]  Edward N. Trifonov,et al.  CURVATURE: software for the analysis of curved DNA , 1993, Comput. Appl. Biosci..

[23]  M Carmona,et al.  Activation of transcription at sigma 54-dependent promoters on linear templates requires intrinsic or induced bending of the DNA. , 1996, Journal of molecular biology.

[24]  Joan Hérisson,et al.  Representation and Processing of Complex DNA Spatial Architecture and its Annotated Genomic Content , 2001, Pacific Symposium on Biocomputing.

[25]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[26]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[27]  A. Palleschi,et al.  Periodical polydeoxynucleotides and DNA curvature. , 1989, Biochemistry.

[28]  A. M. B. DOUGLAS,et al.  X-Ray Crystallography , 1947, Nature.

[29]  C. Martin,et al.  Expression patterns of myb genes from Antirrhinum flowers. , 1991, The Plant cell.

[30]  Mathews Jacob,et al.  3D reconstruction and comparison of shapes of DNA minicircles observed by cryo-electron microscopy , 2006, Nucleic acids research.

[31]  Joan Hérisson,et al.  Yeast Naked DNA Spatial Organization Predisposes to Transcriptional Regulation , 2006, ICCSA.

[32]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1953, Nature.

[33]  Richard Szeliski,et al.  Modeling and analysis of empirical data in collaborative environments , 1992, CACM.

[34]  E. Hamori,et al.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. , 1983, The Journal of biological chemistry.

[35]  H R Drew,et al.  Influence of the sequence-dependent flexure of DNA on transcription in E. coli. , 1989, Nucleic acids research.