Hierarchical Dirichlet Processes for unsupervised online multi-view action perception using Temporal Self-Similarity features

In various real-world applications of distributed and multi-view vision systems, the ability to learn unseen actions in an online fashion is paramount, as most of the actions are not known or sufficient training data is not available at design time. We propose a novel approach which combines the unsupervised learning capabilities of Hierarchical Dirichlet Processes (HDP) with Temporal Self-Similarity Maps (SSM) representations, which have been shown to be suitable for aggregating multi-view information without further model knowledge. Furthermore, the HDP model, being almost completely data-driven, provides us with a system that works almost “out-of-the-box”. Various experiments performed on the extensive JAR-AIBO dataset show promising results, with clustering accuracies up to 60% for a 56-class problem.

[1]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[2]  Tao Xiang,et al.  Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Gregor Heinrich “ Infinite LDA ” – Implementing the HDP with minimum code complexity , 2011 .

[5]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Joachim Denzler,et al.  JAR-Aibo: A Multi-view Dataset for Evaluation of Model-Free Action Recognition Systems , 2013, ICIAP Workshops.

[7]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception by Hierarchical Bayesian Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[10]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[11]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[12]  Joachim Denzler,et al.  Temporal Self-Similarity for Appearance-Based Action Recognition in Multi-View Setups , 2013, CAIP.

[13]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[14]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Chong Wang,et al.  A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process , 2012, ArXiv.

[16]  Bernt Schiele,et al.  Unsupervised Discovery of Structure in Activity Data Using Multiple Eigenspaces , 2006, LoCA.

[17]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[18]  Luc Van Gool,et al.  What's going on? Discovering spatio-temporal dependencies in dynamic scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Shaogang Gong,et al.  Learning Behavioural Context , 2012, International Journal of Computer Vision.