Sequential modeling for identifying CpG island locations in human genome

We consider several sequential processing algorithms for identifying genes in human DNA, based on detecting CpG ("C proceeds G") islands. The algorithms are designed to capture the underlying statistical structure in a DNA sequence. Sequential processing using a Markov model and a hidden Markov model are shown to identify most CpG islands in annotated (marked) DNA subsequences available from publicly available DNA datasets. We also consider a wavelet-based hidden Markov tree (HMT). In the context of the HMT, we address design of adaptive wavelets matched to CpG islands, this accomplished via lifting and genetic-algorithm optimization.

[1]  C. Burrus,et al.  Introduction to Wavelets and Wavelet Transforms: A Primer , 1997 .

[2]  Robert D. Nowak,et al.  Adaptive wavelet transforms via lifting , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Lawrence Carin,et al.  Dual hidden Markov model for characterizing wavelet coefficients from multi-aspect scattering data , 2001, Signal Process..

[4]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[5]  David R. Cox,et al.  The Theory of Stochastic Processes , 1967, The Mathematical Gazette.

[6]  J. Fitch,et al.  Genomic engineering: moving beyond DNA sequence to function , 2000, Proceedings of the IEEE.

[7]  J F Boisvieux,et al.  Hidden Markov model approach for identifying the modular framework of the protein backbone. , 1999, Protein engineering.

[8]  W. Sweldens The Lifting Scheme: A Custom - Design Construction of Biorthogonal Wavelets "Industrial Mathematics , 1996 .

[9]  Robert D. Nowak,et al.  Wavelet-based statistical signal processing using hidden Markov models , 1998, IEEE Trans. Signal Process..

[10]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[11]  Lawrence Carin,et al.  Genetic Algorithm Wavelet Design for Signal Classification , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..