Hidden Conditional Ordinal Random Fields for Sequence Classification

Conditional Random Fields and Hidden Conditional Random Fields are a staple of many sequence tagging and classification frameworks. An underlying assumption in those models is that the state sequences (tags), observed or latent, take their values from a set of nominal categories. These nominal categories typically indicate tag classes (e.g., part-of-speech tags) or clusters of similar measurements. However, in some sequence modeling settings it is more reasonable to assume that the tags indicate ordinal categories or ranks. Dynamic envelopes of sequences such as emotions or movements often exhibit intensities growing from neutral, through raising, to peak values. In this work we propose a new model family, Hidden Conditional Ordinal Random Fields (HCORFs), that explicitly models sequence dynamics as the dynamics of ordinal categories. We formulate those models as generalizations of ordinal regressions to structured (here sequence) settings. We show how classification of entire sequences can be formulated as an instance of learning and inference in H-CORFs. In modeling the ordinal-scale latent variables, we incorporate recent binning-based strategy used for static ranking approaches, which leads to a log-nonlinear model that can be optimized by efficient quasi-Newton or stochastic gradient type searches. We demonstrate improved prediction performance achieved by the proposed models in real video classification problems.

[1]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[2]  Qingshan Liu,et al.  RankBoost with l1 regularization for facial expression recognition and intensity estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[4]  Shaogang Gong,et al.  Conditional Mutual Infomation Based Boosting for Facial Expression Recognition , 2005, BMVC.

[5]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[6]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[7]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[9]  Ying-li Tian,et al.  Evaluation of Face Resolution for Expression Analysis , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[10]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[11]  Takeo Kanade,et al.  Detection, tracking, and classification of action units in facial expression , 2000, Robotics Auton. Syst..

[12]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[13]  Nenghai Yu,et al.  Multiple-instance ranking: Learning to rank images for image retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[15]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[16]  Mark W. Schmidt,et al.  Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[17]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[18]  Martial Hebert,et al.  Discriminative Random Fields , 2006, International Journal of Computer Vision.

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[21]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[22]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.