Convergence Assessment in Latent Variable Models: DNA Applications

A DNA sequence is a long succession of four nucleotides or bases, Adenine, Cytosine, Guanine and Thymine, and can be represented by a finite series \( x = \left( {{x_1}, \cdots,{x_n}} \right) \) ;each base xttaken from the alphabet \( x = \left\{ {A,C,G,T} \right\} \) It turns out that there is an important heterogeneity within the genome.1 Statistical models based on a complete homogeneity assumption are thus unrealistic. We propose a hidden Markov chain approach to identify homogeneous regions in the DNA sequence. The breakpoints which define these regions may thus separate parts of the genome with different functional or structural properties.