Markov Networks for Detecting Overalpping Elements in Sequence Data

Many sequential prediction tasks involve locating instances of patterns in sequences. Generative probabilistic language models, such as hidden Markov models (HMMs), have been successfully applied to many of these tasks. A limitation of these models however, is that they cannot naturally handle cases in which pattern instances overlap in arbitrary ways. We present an alternative approach, based on conditional Markov networks, that can naturally represent arbitrarily overlapping elements. We show how to efficiently train and perform inference with these models. Experimental results from a genomics domain show that our models are more accurate at locating instances of overlapping patterns than are baseline models based on HMMs.

[1]  Yu Qiu,et al.  Predicting bacterial transcription units using sequence and expression data , 2003, ISMB.

[2]  S. Salzberg,et al.  Prediction of transcription terminators in bacterial genomes. , 2000, Journal of molecular biology.

[3]  T Yada,et al.  A novel bacterial gene-finding system with improved accuracy in locating start codons. , 2001, DNA research : an international journal for rapid publication of reports on genes and genomes.

[4]  尚弘 島影 National Institute of Standards and Technologyにおける超伝導研究及び生活 , 2001 .

[5]  Paul Over,et al.  The TREC VIdeo Retrieval Evaluation (TRECVID): A Case Study and Status Report , 2004, RIAO.

[6]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[7]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[8]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[9]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[10]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Pierre Baldi,et al.  Characterization of Prokaryotic and Eukaryotic Promoters Using Hidden Markov Models , 1996, ISMB.

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.