Learning event representation: As sparse as possible, but not sparser

Selecting an optimal event representation is essential for event classification in real world contexts. In this paper, we investigate the application of qualitative spatial reasoning (QSR) frameworks for classification of human-object interaction in three dimensional space, in comparison with the use of quantitative feature extraction approaches for the same purpose. In particular, we modify QSRLib, a library that allows computation of Qualitative Spatial Relations and Calculi, and employ it for feature extraction, before inputting features into our neural network models. Using an experimental setup involving motion captures of human-object interaction as three dimensional inputs, we observe that the use of qualitative spatial features significantly improves the performance of our machine learning algorithm against our baseline, while quantitative features of similar kinds fail to deliver similar improvement. We also observe that sequential representations of QSR features yield the best classification performance. A result of our learning method is a simple approach to the qualitative representation of 3D activities as compositions of 2D actions that can be visualized and learned using 2-dimensional QSR.

[1]  Sankar K. Pal,et al.  Fuzzy discretization of feature space for a rough set classifier , 2003, Pattern Recognit. Lett..

[2]  James Pustejovsky,et al.  Fine-grained event learning of human-object interaction with LSTM-CRF , 2017, ESANN.

[3]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[4]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[5]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..

[6]  Nikolaos Papanikolopoulos,et al.  Learning Dynamic Event Descriptions in Image Sequences , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Behrouz Haji Soleimani,et al.  Partition-wise Recurrent Neural Networks for Point-based AIS Trajectory Classification , 2017, ESANN.

[8]  M. Bhatt,et al.  Perceptual Narratives of Space and Motion for Activity Interpretation , 2013 .

[9]  Chaman L. Sabharwal,et al.  Modeling Cardinal Direction Relations in 3D for Qualitative Spatial Reasoning , 2014, MIKE.

[10]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[11]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[12]  Marina Weber,et al.  Elements Of Episodic Memory , 2016 .

[13]  Andrew U. Frank,et al.  Qualitative Spatial Reasoning with Cardinal Directions , 1991, ÖGAI.

[14]  Anthony G. Cohn,et al.  Learning Relational Event Models from Video , 2015, J. Artif. Intell. Res..

[15]  Mario Fritz,et al.  Recognition of ongoing complex activities by sequence prediction over a hierarchical label space , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[17]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[18]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Nico Van de Weghe,et al.  QTC3D: Extending the Qualitative Trajectory Calculus to Three Dimensions , 2015, Inf. Sci..

[20]  Nico Van de Weghe,et al.  Implementing a qualitative calculus to analyse moving point objects , 2011, Expert Syst. Appl..

[21]  James Pustejovsky,et al.  ECAT: Event Capture Annotation Tool , 2016, ArXiv.

[22]  A. G. Amitha Perera,et al.  Video Activity Recognition in the Real World , 2008, AAAI.

[23]  Geoffrey I. Webb,et al.  Discretization for naive-Bayes learning: managing discretization bias and variance , 2008, Machine Learning.

[24]  Bernt Schiele,et al.  A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[26]  Ben Taskar,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Anthony G. Cohn,et al.  QSRlib: a software library for online acquisition of qualitative spatial relations from video , 2016 .

[29]  James Pustejovsky,et al.  VoxSim: A Visual Platform for Modeling Motion Language , 2016, COLING.