论文信息 - CRF-OPT: An Efficient High-Quality Conditional Random Field Solver

CRF-OPT: An Efficient High-Quality Conditional Random Field Solver

Conditional random field (CRF) is a popular graphical model for sequence labeling. The flexibility of CRF poses significant computational challenges for training. Using existing optimization packages often leads to long training time and unsatisfactory results. In this paper, we develop CRFOPT, a general CRF training package, to improve the efficiency and quality for training CRFs. We propose two improved versions of the forward-backward algorithm that exploit redundancy and reduce the time by several orders of magnitudes. Further, we propose an exponential transformation that enforces sufficient step sizes for quasi-Newton methods. The technique improves the convergence quality, leading to better training results. We evaluate CRF-OPT on a gene prediction task on pathogenic DNA sequences, and show that it is faster and achieves better prediction accuracy than both the HMM models and the original CRF model without exponential transformation.

[1] Andrew McCallum,et al. Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[2] Paul Levi,et al. GENIO/scan - EST Guided Identification of Genes in Human Genomic DNA , 1998, German Conference on Bioinformatics.

[3] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4] Chuong B. Do,et al. CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction , 2007, Genome Biology.