Separate training for conditional random fields using co-occurrence rate factorization

Conditional Random Fields (CRFs) are undirected graphical models which are well suited to many natural language processing (NLP) tasks, such part-of-speech (POS) tagging and named entity recognition (NER). The standard training method of CRFs can be very slow for large-scale applications. As an alternative to the standard training method, piecewise training divides the full graph into pieces, trains them independently, and combines the learned weights at test time. But piecewise training does not scale well in the variable cardinality. In this paper we present separate training for undirected models based on the novel Co-occurrence Rate factorization (CR- F). Separate training is a local training method without global propagation. In contrast to directed markov models such as MEMMs, separate training is unaff ected by the label bias problem even it is a local normalized method. We do experiments on two NLP tasks, i.e., POS tagging and NER. Results show that separate training (i) is unaffected by the label bias problem; (ii) reduces the training time from weeks to seconds; and (iii) obtains competitive results to the standard and piecewise training on linear-chain CRFs. Separate training is a promising technique for scaling undirected models for natural language processing tasks. (More details can be found here: http://eprints.eemcs.utwente.nl/22600/)

[1]  G. A. Barnard,et al.  Transmission of Information: A Statistical Theory of Communications. , 1961 .

[2]  Phil Blunsom,et al.  Semantic Role Labelling with Tree Conditional Random Fields , 2005, CoNLL.

[3]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[6]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[7]  Andrew McCallum,et al.  Piecewise Training for Undirected Models , 2005, UAI.

[8]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[9]  Max Welling,et al.  On the Choice of Regions for Generalized Belief Propagation , 2004, UAI.

[10]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[11]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[12]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[13]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[14]  Trevor Cohn,et al.  Scaling conditional random fields for natural language processing , 2007 .

[15]  Kevin Black Undirected Graphical Models , 2003 .

[16]  Phil Blunsom,et al.  Discriminative Word Alignment with Conditional Random Fields , 2006, ACL.

[17]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[18]  Trevor Hastie,et al.  Undirected Graphical Models , 2009 .

[19]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[20]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.