Piecewise Training with Parameter Independence Diagrams: Comparing Globally- and Locally-trained Linear-chain CRFs

We present a diagrammatic formalism and practial methods for introducing additional independence assumptions into parameter estimation, enabling efficient training of undirected graphical models in locally-normalized pieces. On two real-world data sets we demonstrate our locally-trained linear-chain CRFs outperforming traditional CRFs— training in less than one-fifth the time, and providing a statisticallysignificant gain in accuracy.

[1]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[2]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[3]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[4]  Dan Roth,et al.  The Use of Classifiers in Sequential Inference , 2001, NIPS.

[5]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[6]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Yee Whye Teh,et al.  An Alternate Objective Function for Markovian Fields , 2002, ICML.

[9]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[10]  Dan Roth Reasoning with Classifiers , 2002, PKDD.

[11]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[12]  Andrew McCallum,et al.  Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference , 2003, IIWeb.

[13]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[14]  Dan Klein,et al.  Named Entity Recognition with Character-Level Models , 2003, CoNLL.

[15]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[16]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[17]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[18]  Andrew McCallum,et al.  An Integrated, Conditional Model of Information Extraction and Coreference with Appli , 2004, UAI.

[19]  Andrew McCallum,et al.  Collective Segmentation and Labeling of Distant Entities in Information Extraction , 2004 .

[20]  A. McCallum,et al.  A Note on Semi-Supervised Learning using Markov Random Fields , 2004 .