A combination of generative and discriminative models for fast unsupervised activity recognition from traffic scene videos

Recent approaches in traffic and crowd scene analysis make extensive use of non-parametric hierarchical Bayesian models for intelligent clustering of features into activities. Although this has yielded impressive results, it requires the use of time consuming Bayesian inference during both training and classification. Therefore, we seek to limit Bayesian inference to the training stage, where unsupervised clustering is performed to extract semantically meaningful activities from the scene. In the testing stage, we use discriminative classifiers, taking advantage of their relative simplicity and fast inference. Experiments on publicly available data-sets show that our approach is comparable in classification accuracy to state-of-the-art methods and provides a significant speed-up in the testing phase.

[1]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[2]  Luc Van Gool,et al.  Temporal Relations in Videos for Unsupervised Activity Analysis , 2011, BMVC.

[3]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[4]  Shaogang Gong,et al.  Learning Behavioural Context , 2012, International Journal of Computer Vision.

[5]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[8]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[10]  Jean-Marc Odobez,et al.  Probabilistic Latent Sequential Motifs: Discovering Temporal Activity Patterns in Video Scenes , 2010, BMVC.

[11]  Luc Van Gool,et al.  What's going on? Discovering spatio-temporal dependencies in dynamic scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Nicu Sebe,et al.  A Prototype Learning Framework Using EMD: Application to Complex Scenes Analysis , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Luc Van Gool,et al.  Exploiting simple hierarchies for unsupervised human behavior analysis , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Bernt Schiele,et al.  Decomposition, discovery and detection of visual categories using topic models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Chong Wang,et al.  A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process , 2012, ArXiv.

[19]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception by Hierarchical Bayesian Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Tao Xiang,et al.  Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Shaogang Gong,et al.  Global Behaviour Inference using Probabilistic Latent Semantic Analysis , 2008, BMVC.