Text Information Extraction Based on Genetic Algorithm and Hidden Markov Model

Since the traditional training method of HMM for text information extraction is sensitive to the initial model parameters and easy to converge to a local optimal model in practice ,a novel hybrid model of genetic algorithm (GA) and hidden Markov model (HMM) for text information extraction is presented. During the parameter training phase, the hybrid method combines GA and Baum-Welch algorithm to optimize HMM parameters globally. In the selection process of the HMM initial parameters, the hybrid method adopts GA which uses real number matrix encoding as the representation of the chromosomes and the likelihood values as the fitness values, and then utilizes a modified Baum-Welch algorithm to reevaluate parameters and construct HMM. And during the information extraction phase, an improved Viterbi algorithm is presented to obtain the optimal state sequence of test sample for text information extraction. Experimental results show that the new algorithm improves the performance in precision and recall.