Conditional random fields for activity recognition

Activity recognition is a key component for creating intelligent, multi-agent systems. Intrinsically, activity recognition is a temporal classification problem. In this paper, we compare two models for temporal classification: hidden Markov models (HMMs), which have long been applied to the activity recognition problem, and conditional random fields (CRFs). CRFs are discriminative models for labeling sequences. They condition on the entire observation sequence, which avoids the need for independence assumptions between observations. Conditioning on the observations vastly expands the set of features that can be incorporated into the model without violating its assumptions. Using data from a simulated robot tag domain, chosen because it is multi-agent and produces complex interactions between observations, we explore the differences in performance between the discriminatively trained CRF and the generative HMM. Additionally, we examine the effect of incorporating features which violate independence assumptions between observations; such features are typically necessary for high classification accuracy. We find that the discriminatively trained CRF performs as well as or better than an HMM even when the model features do not violate the independence assumptions of the HMM. In cases where features depend on observations from many time steps, we confirm that CRFs are robust against any degradation in performance.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Maja J. Mataric,et al.  Coordinating mobile robot group behavior using a model of interaction dynamics , 1999, AGENTS '99.

[3]  Yi Lin,et al.  AN EFFECTIVE METHOD FOR HIGH-DIMENSIONAL LOG-DENSITY ANOVA ESTIMATION, WITH APPLICATION TO NONPARAMETRIC GRAPHICAL MODEL BUILDING , 2006 .

[4]  北野 宏明,et al.  RoboCup-97 : robot soccer World Cup I , 1998 .

[5]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[6]  Stefan Riezler,et al.  Incremental Feature Selection and l1 Regularization for Relaxed Maximum-Entropy Modeling , 2004, EMNLP.

[7]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[8]  Vladimir M. Pentkovski,et al.  Implementing Streaming SIMD Extensions on the Pentium III Processor , 2000, IEEE Micro.

[9]  Maja J. Mataric,et al.  Maximizing Reward in a Non-Stationary Mobile Robot Environment , 2003, Autonomous Agents and Multi-Agent Systems.

[10]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[11]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[12]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[13]  Ben Taskar,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[14]  Svetha Venkatesh,et al.  Policy Recognition in the Abstract Hidden Markov Model , 2002 .

[15]  Maryam Mahdaviani,et al.  Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition , 2007, NIPS.

[16]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[17]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[18]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[19]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[22]  Manuela M. Veloso,et al.  Feature selection for activity recognition in multi-robot domains , 2008, AAAI 2008.

[23]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[24]  C. Geyer,et al.  Constrained Monte Carlo Maximum Likelihood for Dependent Data , 1992 .

[25]  Noah A. Smith,et al.  Computationally Efficient M-Estimation of Log-Linear Structure Models , 2007, ACL.

[26]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[27]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[28]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[29]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[30]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[31]  M. Veloso,et al.  Using Sparse Visual Data to Model Human Activities in Meetings , 2004 .

[32]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[33]  John A. Nelder,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[34]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[35]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[36]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[37]  Andrew McCallum,et al.  Sparse Forward-Backward for Fast Training of Conditional Random Fields , 2006 .

[38]  Manuela M. Veloso,et al.  CMDragons: Dynamic passing and strategy on a champion robot soccer team , 2008, 2008 IEEE International Conference on Robotics and Automation.

[39]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[40]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[41]  Rich Caruana,et al.  How Useful Is Relevance , 1994 .

[42]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[44]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[45]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[46]  Henry A. Kautz,et al.  Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields , 2007, Int. J. Robotics Res..

[47]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[48]  Manuela Veloso,et al.  Learning from accelerometer data on a legged robot , 2004 .

[49]  Manuela M. Veloso,et al.  Non-Parametric Time Series Classification , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[50]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[51]  Henry A. Kautz,et al.  Learning and inferring transportation routines , 2004, Artif. Intell..

[52]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[53]  Brett Browning,et al.  STP: Skills, tactics, and plays for multi-robot control in adversarial environments , 2005 .

[54]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[55]  Henry A. Kautz,et al.  Location-Based Activity Recognition using Relational Markov Networks , 2005, IJCAI.

[56]  Manuela M. Veloso,et al.  Feature selection in conditional random fields for activity recognition , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[57]  Michael C. Horsch,et al.  Dynamic Bayesian networks , 1990 .

[58]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[59]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[60]  Hunter Scales,et al.  AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.

[61]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[62]  Blake Hannaford,et al.  A Hybrid Discriminative/Generative Approach for Modeling Human Activities , 2005, IJCAI.

[63]  Yongdai Kim,et al.  Gradient LASSO for feature selection , 2004, ICML.

[64]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[65]  Pedro M. Domingos,et al.  Relational Markov models and their application to adaptive web navigation , 2002, KDD.

[66]  Maja J. Mataric,et al.  General spatial features for analysis of multi-robot and human activities from raw position data , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[67]  Manuela Veloso,et al.  On-line robot adaptation to environmental change , 2005 .

[68]  A. V. D. Vaart,et al.  Asymptotic Statistics: U -Statistics , 1998 .

[69]  Cristian Sminchisescu,et al.  Conditional Random Fields for Contextual Human Motion Recognition , 2005, ICCV.

[70]  D. Bertsekas Projected Newton methods for optimization problems with simple constraints , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[71]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[72]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[73]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[74]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[75]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[76]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[77]  Andrew McCallum,et al.  Improved Dynamic Schedules for Belief Propagation , 2007, UAI.

[78]  Manuela M. Veloso,et al.  Fast and accurate vision-based pattern detection and identification , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[79]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[80]  Svetha Venkatesh,et al.  Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[81]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[82]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[83]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[84]  Brett Browning,et al.  Multi-robot team response to a multi-robot opponent team , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[85]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[86]  Rina Dechter,et al.  Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[87]  Jun'ichi Tsujii,et al.  Evaluation and Extension of Maximum Entropy Models with Inequality Constraints , 2003, EMNLP.

[88]  Richard S. Zemel,et al.  Combining discriminative features to infer complex trajectories , 2006, ICML.

[89]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[90]  Henry A. Kautz,et al.  Hierarchical Conditional Random Fields for GPS-Based Activity Recognition , 2005, ISRR.

[91]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[92]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[93]  William H. Press,et al.  Numerical recipes in C++: the art of scientific computing, 2nd Edition (C++ ed., print. is corrected to software version 2.10) , 1994 .

[94]  Henry A. Kautz,et al.  Inferring High-Level Behavior from Low-Level Sensors , 2003, UbiComp.

[95]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[96]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[97]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[98]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[99]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[100]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[101]  Henry A. Kautz,et al.  Training Conditional Random Fields Using Virtual Evidence Boosting , 2007, IJCAI.

[102]  Jeff A. Bilmes,et al.  Rao-Blackwellized Particle Filters for Recognizing Activities and Spatial Context from Wearable Sensors , 2006, ISER.

[103]  Michael I. Jordan Graphical Models , 1998 .

[104]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[105]  Michael L. Littman,et al.  Activity Recognition from Accelerometer Data , 2005, AAAI.

[106]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[107]  Bernt Schiele,et al.  Analyzing features for activity recognition , 2005, sOc-EUSAI '05.

[108]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[109]  Gita Reese Sukthankar,et al.  Robust recognition of physical team behaviors using spatio-temporal models , 2006, AAMAS '06.

[110]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[111]  Christopher Joseph Pal,et al.  Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[112]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[113]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[114]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[115]  Hanna M. Wallach,et al.  Efficient Training of Conditional Random Fields , 2002 .

[116]  Manuela M. Veloso,et al.  Classification of robotic sensor streams using non-parametric statistics , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[117]  Manuela Veloso,et al.  Automated Robot Behavior Recognition Applied to Robotic Soccer , 1999 .

[118]  Maja J. Matarić,et al.  Augmented Markov Models , 1999 .

[119]  Lide Wu,et al.  A Fast Algorithm for Feature Selection in Conditional Maximum Entropy Modeling , 2003, EMNLP.

[120]  Olga Russakovsky,et al.  Training Conditional Random Fields for Maximum Labelwise Accuracy , 2006, NIPS.