Linear Co-occurrence Rate Networks (L-CRNs) for Sequence Labeling

Sequence labeling has wide applications in natural language processing and speech processing. Popular sequence labeling models suffer from some known problems. Hidden Markov models (HMMs) are generative models and they cannot encode transition features; Conditional Markov models (CMMs) suffer from the label bias problem; And training of conditional random fields (CRFs) can be expensive. In this paper, we propose Linear Co-occurrence Rate Networks (L-CRNs) for sequence labeling which avoid the mentioned problems with existing models. The factors of L-CRNs can be locally normalized and trained separately, which leads to a simple and efficient training method. Experimental results on real-world natural language processing data sets show that L-CRNs reduce the training time by orders of magnitudes while achieve very competitive results to CRFs.

[1]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[2]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[3]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[4]  Trevor Cohn,et al.  Scaling conditional random fields for natural language processing , 2007 .

[5]  Thomas Hofmann,et al.  Exponential Families for Conditional Random Fields , 2004, UAI.

[6]  Djoerd Hiemstra,et al.  Comparison of local and global undirected graphical models , 2014, ESANN.

[7]  Djoerd Hiemstra,et al.  Separate training for conditional random fields using co-occurrence rate factorization , 2010, 1008.1566.

[8]  Xuan-Hieu Phan,et al.  On the effect of the label bias problem in part-of-speech tagging , 2013, The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF).

[9]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Djoerd Hiemstra,et al.  Empirical Co-occurrence Rate Networks for Sequence Labeling , 2013, 2013 12th International Conference on Machine Learning and Applications.

[12]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[13]  Zhemin Zhu,et al.  Factorizing Probabilistic Graphical Models Using Cooccurrence Rate , 2010, ArXiv.

[14]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[15]  Zoubin Ghahramani,et al.  An Introduction to Hidden Markov Models and Bayesian Networks , 2001, Int. J. Pattern Recognit. Artif. Intell..

[16]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[17]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[18]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[21]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.