Extraction of hidden Markov model representations of signal patterns in DNA sequences.

We have developed a method to extract the signal patterns in DNA sequences. In this method, the Genetic Algorithm (GA) and Baum-Welch algorithm are used to obtain the best Hidden Markov Model (HMM) representations of the signal patterns in DNA sequences. The GA is used to search the best network shapes and the initial parameters of the HMMs. Baum-Welch algorithm is used to optimize the HMM parameters for the given network shapes. Akaike Information Criterion (AIC), which gives a criterion for the balance of adaptation and complexity of a model, is applied in the HMM evaluation. We have applied the method to the extraction of the signal patterns in human promoters and 5' ends of yeast introns. As a result, we obtained HMM representations of characteristic features in these sequences. To validate the efficiency of the method, we have performed promoter recognition using obtained HMMs. Two entries including nine promoters are selected from GenBank 76.0, and it is observed that the HMM can predicts eight promoters correctly. These results imply that the method is efficient to design preferable HMM networks, and provides reliable models for the recognition of the signal patterns.