Stacked Conditional Random Fields Exploiting Structural Consistencies

Conditional Random Fields (CRF) are popular methods for labeling unstructured or textual data. Like many machine learning approaches these undirected graphical models assume the instances to be independently distributed. However, in real world applications data is grouped in a natural way, e.g., by its creation context. The instances in each group often share additional consistencies in the structure of their information. This paper proposes a domain-independent method for exploiting these consistencies by combining two CRFs in a stacked learning framework. The approach incorporates three successive steps of inference: First, an initial CRF processes single instances as usual. Next, we apply rule learning collectively on all labeled outputs of one context to acquire descriptions of its specific properties. Finally, we utilize these descriptions as dynamic and high quality features in an additional (stacked) CRF. The presented approach is evaluated with a real-world dataset for the segmentation of references and achieves a significant reduction of the labeling error.

[1]  William W. Cohen,et al.  Stacked Graphical Models for Efficient Inference in Markov Random Fields , 2007, SDM.

[2]  William W. Cohen,et al.  Intra-document structural frequency features for semi-supervised domain adaptation , 2008, CIKM '08.

[3]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[4]  Richard S. Zemel,et al.  Learning Flexible Features for Conditional Random Fields , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Andrew McCallum,et al.  Accurate Information Extraction from Research Papers using Conditional Random Fields , 2004, NAACL.

[8]  C. Lee Giles,et al.  ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[9]  Christopher D. Manning,et al.  An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition , 2006, ACL.

[10]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[11]  Razvan C. Bunescu,et al.  Collective Information Extraction with Relational Markov Networks , 2004, ACL.

[12]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[13]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[14]  Rajeev Rastogi,et al.  Exploiting content redundancy for web information extraction , 2010, Proc. VLDB Endow..

[15]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[16]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[17]  Andrew McCallum,et al.  Collective Segmentation and Labeling of Distant Entities in Information Extraction , 2004 .

[18]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[19]  Andrew McCallum,et al.  Bi-directional Joint Inference for Entity Resolution and Segmentation Using Imperatively-Defined Factor Graphs , 2009, ECML/PKDD.

[20]  Milos Hauskrecht,et al.  Constructing classification features using minimal predictive patterns , 2010, CIKM '10.

[21]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[22]  Yida Wang,et al.  Incorporating site-level knowledge to extract structured data from web forums , 2009, WWW '09.