Hierarchical semi-Markov conditional random fields for deep recursive sequential data

Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we develop efficient algorithms for learning and constrained inference in a partially-supervised setting, which is important issue in practice where labels can only be obtained sparsely. We demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.

[1]  Svetha Venkatesh,et al.  MCMC for Hierarchical Semi-Markov Conditional Random Fields , 2009, NIPS 2009.

[2]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[3]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Dong Yu,et al.  Sequential Labeling Using Deep-Structured Conditional Random Fields , 2010, IEEE Journal of Selected Topics in Signal Processing.

[7]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[8]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Bill Triggs,et al.  Scene Segmentation with CRFs Learned from Partially Labeled Images , 2007, NIPS.

[10]  Svetha Venkatesh,et al.  AdaBoost.MRF: Boosted Markov Random Forests and Application to Multilevel Activity Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[12]  Svetha Venkatesh,et al.  Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Michael I. Jordan Graphical Models , 1998 .

[14]  Svetha Venkatesh,et al.  Policy Recognition in the Abstract Hidden Markov Model , 2002, J. Artif. Intell. Res..

[15]  C. Arndt Maximum entropy estimation , 2001 .

[16]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[17]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[18]  Henry A. Kautz,et al.  Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields , 2007, Int. J. Robotics Res..

[19]  Svetha Venkatesh,et al.  Learning Hierarchical Hidden Markov Models with General State Hierarchy , 2004, AAAI.

[20]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[21]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[22]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[23]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[24]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[25]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[26]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[27]  Mark W. Schmidt,et al.  Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[28]  Eric Horvitz,et al.  Layered representations for learning and inferring office activity from multiple sensory channels , 2004, Comput. Vis. Image Underst..

[29]  James R. Curran,et al.  Log-Linear Models for Wide-Coverage CCG Parsing , 2003, EMNLP.

[30]  Tsujii Jun'ichi,et al.  Maximum entropy estimation for feature forests , 2002 .

[31]  Xuanjing Huang,et al.  Sparse higher order conditional random fields for improved sequence labeling , 2009, ICML '09.

[32]  Svetha Venkatesh Hierarchical Conditional Random Fields for Recursive Sequential Data , 2008, NIPS 2008.

[33]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[34]  Paul A. Viola,et al.  Interactive Information Extraction with Constrained Conditional Random Fields , 2004, AAAI.

[35]  Charles Sutton,et al.  Conditional Probabilistic Context-Free Grammars , 2004 .

[36]  Dan Wu,et al.  Conditional Random Fields with High-Order Features for Sequence Labeling , 2009, NIPS.

[37]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[38]  Henry A. Kautz,et al.  Hierarchical Conditional Random Fields for GPS-Based Activity Recognition , 2005, ISRR.

[39]  Kevin P. Murphy,et al.  Linear-time inference in Hierarchical HMMs , 2001, NIPS.

[40]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[41]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[42]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.