Short Exon Detection in DNA Sequences Based on Multifeature Spectral Analysis

This paper presents a new technique for the detection of short exons in DNA sequences. In this method, we analyze four DNA structural properties, which include the DNA bending stiffness, disrupt energy, free energy, and propeller twist, using the autoregressive (AR) model. The linear prediction matrices for the four features are combined to find the same set of linear prediction coefficients, from which we estimate the spectrum of the DNA sequence and detect exons based on the 1/3 frequency component. To overcome the nonstationarity of DNA sequences, we use moving windows of different sizes in the AR model. Experiments on the human genome show that our multi-feature based method is superior in performance to existing exon detection algorithms.

[1]  J. Oliver,et al.  Sequence Compositional Complexity of DNA through an Entropic Segmentation Method , 1998 .

[2]  M. A. El Hassan,et al.  Propeller-twisting of base-pairs and the conformational mobility of dinucleotide steps in DNA. , 1996, Journal of molecular biology.

[3]  Hong Yan,et al.  Eukaryotic promoter prediction based on relative entropy and positional information. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Yvan Saeys,et al.  Large-scale structural analysis of the core promoter in mammalian and plant genomes , 2005, Nucleic acids research.

[5]  Mahmood Akhtar,et al.  Optimizing period-3 methods for eukaryotic gene prediction , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Leonidas D. Iasemidis,et al.  Autoregressive Modeling and Feature Analysis of DNA Sequences , 2004, EURASIP J. Adv. Signal Process..

[7]  Juan V. Lorenzo-Ginori,et al.  Digital Signal Processing in the Analysis of Genomic Sequences , 2009 .

[8]  Hong Yan,et al.  Multi-scale parametric spectral analysis for exon detection in DNA sequences based on forward-backward linear prediction and singular value decomposition of the double-base curves , 2008, Bioinformation.

[9]  Tessamma Thomas,et al.  Discrete wavelet transform de-noising in eukaryotic gene splicing , 2010, BMC Bioinformatics.

[10]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[11]  N. Sugimoto,et al.  Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. , 1996, Nucleic acids research.

[12]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[13]  J. Hawkins,et al.  A survey on intron and exon lengths. , 1988, Nucleic acids research.

[14]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[15]  Jaakko Astola,et al.  Segmentation of DNA into Coding and Noncoding Regions Based on Recursive Entropic Segmentation and Stop-Codon Statistics , 2004, EURASIP J. Adv. Signal Process..

[16]  P. P. Vaidyanathan,et al.  GENE AND EXON PREDICTION USING ALLPASS-BASED FILTERS , 2002 .

[17]  S. Lang,et al.  Frequency estimation with maximum entropy spectral estimators , 1980 .

[18]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[19]  D. Sornette,et al.  Data-adaptive wavelets and multi-scale singular-spectrum analysis , 1998, chao-dyn/9810034.

[20]  Hong Yan,et al.  Studies of spectral properties of short genes using the wavelet subspace Hilbert–Huang transform (WSHHT) , 2008 .

[21]  Hong Yan,et al.  PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm , 2006, Bioinform..

[22]  A. Newman,et al.  RNA splicing , 1998, Current Biology.

[23]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[24]  A V Sivolob,et al.  Translational positioning of nucleosomes on DNA: the role of sequence-dependent isotropic DNA bending stiffness. , 1995, Journal of molecular biology.

[25]  H. Blöcker,et al.  Predicting DNA duplex stability from the base sequence. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Tuan D. Pham,et al.  Spectral estimation techniques for DNA sequence and microarray data analysis , 2007 .

[27]  J. Hawkins A survey on intron and exon lengths. , 1988, Nucleic acids research.

[28]  Alan Wee-Chung Liew,et al.  DB-Curve: a novel 2D method of DNA sequence visualization and representation , 2003 .