论文信息 - Event Detection in Continuous Video: An Inference in Point Process Approach

Event Detection in Continuous Video: An Inference in Point Process Approach

We propose a novel approach toward event detection in real-world continuous video sequences. The method: 1) is able to model arbitrary-order non-Markovian dependences in videos to mitigate local visual ambiguities; 2) conducts simultaneous event segmentation and labeling; and 3) is time-window free. The idea is to represent a video as an event stream of both high-level semantic events and low-level video observations. In training, we learn a point process model called a piecewise-constant conditional intensity model (PCIM) that is able to capture complex non-Markovian dependences in the event streams. In testing, event detection can be modeled as the inference of high-level semantic events, given low-level image observations. We develop the first inference algorithm for PCIM and show it samples exactly from the posterior distribution. We then evaluate the video event detection task on real-world video sequences. Our model not only provides competitive results on the video event segmentation and labeling task, but also provides benefits, including being interpretable and efficient.

Christian R. Shelton | Zhen Qin | C. Shelton | Zhen Qin

[1] Winfried K. Grassmann. Transient solutions in markovian queueing systems , 1977, Comput. Oper. Res..

[2] Sharath Pankanti,et al. Temporal Sequence Modeling for Video Event Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Fei-Fei Li,et al. Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4] N. Nikolaidis,et al. Video shot detection and condensed representation. a review , 2006, IEEE Signal Processing Magazine.

[5] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[6] Amit K. Roy-Chowdhury,et al. Continuous Learning of Human Activity Models Using Deep Nets , 2014, ECCV.

[7] Nir Friedman,et al. Mean Field Variational Approximation for Continuous-Time Bayesian Networks , 2009, J. Mach. Learn. Res..

[8] Ankur Parikh,et al. Conjoint Modeling of Temporal Dependencies in Event Streams , 2012, BMA.

[9] Mohamed R. Amer,et al. Sum-product networks for modeling activities with stochastic structure , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Hongbo Deng,et al. Identifying and labeling search tasks via query-based hawkes processes , 2014, KDD.

[11] Thore Graepel,et al. Poisson-Networks: A Model for Structured Poisson Processes. , 2005 .

[12] Xin Wang,et al. Modeling transition patterns between events for temporal human action segmentation and classification , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[13] Jing Xu,et al. Importance Sampling for Continuous Time Bayesian Networks , 2010, J. Mach. Learn. Res..

[14] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[15] Fernando De la Torre,et al. Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[16] Andrew B. Whinston,et al. Path to Purchase: A Mutually Exciting Point Process Model for Online Advertising and Conversion , 2012, Manag. Sci..

[17] Amit K. Roy-Chowdhury,et al. Context-Aware Activity Modeling Using Hierarchical Conditional Random Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Daphne Koller,et al. Continuous Time Bayesian Networks , 2012, UAI.

[19] Christian R. Shelton,et al. Deterministic Anytime Inference for Stochastic Continuous-Time Markov Processes , 2014, ICML.

[20] Li Fei-Fei,et al. End-to-End Learning of Action Detection from Frame Glimpses in Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Gregory D. Hager,et al. Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, CVPR.

[22] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[23] G. Shedler,et al. Simulation of Nonhomogeneous Poisson Processes by Thinning , 1979 .

[24] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[25] Kejun Wang,et al. Video-Based Abnormal Human Behavior Recognition—A Review , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[26] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Yee Whye Teh,et al. Fast MCMC sampling for Markov jump processes and extensions , 2012, J. Mach. Learn. Res..

[28] Benjamin Z. Yao,et al. Unsupervised learning of event AND-OR grammar and semantics from video , 2011, 2011 International Conference on Computer Vision.

[29] Xiaohui Xie,et al. Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[30] Sharath Pankanti,et al. Spatio-temporal fisher vector coding for surveillance event detection , 2013, ACM Multimedia.

[31] Darren J Wilkinson,et al. Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo , 2011, Interface Focus.

[32] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[33] David Madigan,et al. Probabilistic Temporal Reasoning , 2005, Handbook of Temporal Reasoning in Artificial Intelligence.

[34] Meng Wang,et al. Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.

[35] Mubarak Shah,et al. Recognition of Complex Events: Exploiting Temporal Dynamics between Underlying Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36] David Page,et al. Forest-Based Point Process for Event Prediction from Electronic Health Records , 2013, ECML/PKDD.

[37] Ronald Poppe,et al. A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[38] Le Song,et al. Scalable Influence Estimation in Continuous-Time Diffusion Networks , 2013, NIPS.

[39] Silvio Savarese,et al. Action Recognition by Hierarchical Mid-Level Action Elements , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40] Yu Fan,et al. Learning Continuous-Time Social Network Dynamics , 2009, UAI.

[41] Fei Gao,et al. Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[42] Yu Qiao,et al. Action Recognition with Stacked Fisher Vectors , 2014, ECCV.

[43] Yee Whye Teh,et al. Fast MCMC sampling for Markov jump processes and continuous time Bayesian networks , 2011, UAI.

[44] Nir Friedman,et al. Continuous-Time Belief Propagation , 2010, ICML.

[45] Jiaying Liu,et al. PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding , 2017, ArXiv.

[46] A. Hawkes. Spectra of some self-exciting and mutually exciting point processes , 1971 .

[47] Puyang Xu,et al. A Model for Temporal Dependencies in Event Streams , 2011, NIPS.

[48] Limin Wang,et al. Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Zhen Qin,et al. Auxiliary Gibbs Sampling for Inference in Piecewise-Constant Conditional Intensity Models , 2015, UAI.