Large genome-sequencing projects have made urgent the development of accurate methods for annotation of DNA sequences. Existing methods combine ab inito pattern searches with knowledge gathered from comparison with sequence databases or from training sets of known genes. However, the accuracy of these methods is still far from satisfactory. In the present study, wavelet algorithms in combination with entropy method are being developed as an alternative way to determine gene locations in genomic DNA sequences. Wavelet methods seek periodicity present in sequences. A promising advantage of wavelets is their adaptivity to varying lengths of coding/noncoding regions. Moreover, the wavelet methods integrated with entropy method just search the information contents of the sequences, which do not need to be trained. The preliminary results show that the wavelet approach is feasible and may be better than some knowledge-dependent approaches based on a sample of genomic DNA sequences.
[1]
Paul M. Embree,et al.
C++ algorithms for digital signal processing
,
1998
.
[2]
R. Guigó,et al.
An assessment of gene prediction accuracy in large DNA sequences.
,
2000,
Genome research.
[3]
R. Voss,et al.
Evolution of long-range fractal correlations and 1/f noise in DNA base sequences.
,
1992,
Physical review letters.
[4]
Pietro Liò,et al.
Wavelets in bioinformatics and computational biology: state of art and perspectives
,
2003,
Bioinform..
[5]
O. Holter.
Wavelet Analysis of Time Series
,
1995
.
[6]
S. Salzberg,et al.
Computational gene finding in plants
,
2004,
Plant Molecular Biology.
[7]
J. Claverie.
Computational methods for the identification of genes in vertebrate genomic sequences.
,
1997,
Human molecular genetics.
[8]
J. Fickett,et al.
Assessment of protein coding measures.
,
1992,
Nucleic acids research.