Dynamic hierarchical Markov random fields and their application to web data extraction

Hierarchical models have been extensively studied in various domains. However, existing models assume fixed model structures or incorporate structural uncertainty generatively. In this paper, we propose Dynamic Hierarchical Markov Random Fields (DHMRFs) to incorporate structural uncertainty in a discriminative manner. DHMRFs consist of two parts -- structure model and class label model. Both are defined as exponential family distributions. Conditioned on observations, DHMRFs relax the independence assumption as made in directed models. As exact inference is intractable, a variational method is developed to learn parameters and to find the MAP model structure and label assignment. We apply the model to a real-world web data extraction task, which automatically extracts product items for sale on the Web. The results show promise.

[1]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[2]  Wei-Ying Ma,et al.  2D Conditional Random Fields for Web information extraction , 2005, ICML.

[3]  Geoffrey E. Hinton,et al.  A New Learning Algorithm for Mean Field Boltzmann Machines , 2002, ICANN.

[4]  Paul A. Viola,et al.  Corrective feedback and persistent learning for information extraction , 2006, Artif. Intell..

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  E. Ziegel,et al.  Artificial intelligence and statistics , 1986 .

[7]  A. Willsky Multiresolution Markov models for signal and image processing , 2002, Proc. IEEE.

[8]  Thomas S. Huang,et al.  Image processing , 1971 .

[9]  Wei-Ying Ma,et al.  Simultaneous record detection and attribute labeling in web data extraction , 2006, KDD '06.

[10]  Michael C. Nechyba,et al.  Dynamic trees for unsupervised segmentation and matching of image regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Amos J. Storkey Dynamic Trees: A Structured Variational Method Giving Efficient Propagation Rules , 2000, UAI.

[12]  Alan L. Yuille,et al.  The Convergence of Contrastive Divergences , 2004, NIPS.

[13]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[14]  Bing Liu,et al.  Web data extraction based on partial tree alignment , 2005, WWW '05.

[15]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[16]  Ben Taskar,et al.  Learning Probabilistic Models of Relational Structure , 2001, ICML.

[17]  Josiane Zerubia,et al.  Multiscale Markov random field models for parallel image classification , 1993, 1993 (4th) International Conference on Computer Vision.

[18]  Robert M. Gray,et al.  Multiresolution image classification by hierarchical modeling with two-dimensional hidden Markov models , 2000, IEEE Trans. Inf. Theory.

[19]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[20]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Christopher K. I. Williams,et al.  Image Modeling with Position-Encoding Dynamic Trees , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[23]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[24]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[25]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[26]  Christopher K. I. Williams,et al.  DTs: Dynamic Trees , 1998, NIPS.

[27]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[28]  Dieter Fox,et al.  Location-Based Activity Recognition , 2005, KI.

[29]  Paul W. Fieguth,et al.  An overlapping tree approach to multiscale stochastic modeling and estimation , 1997, IEEE Trans. Image Process..