论文信息 - Sequential Max-Margin Event Detectors

Sequential Max-Margin Event Detectors

Many applications in computer vision (e.g., games, human computer interaction) require a reliable and early detector of visual events. Existing event detection methods rely on one-versus-all or multi-class classifiers that do not scale well to online detection of large number of events. This paper proposes Sequential Max-Margin Event Detectors (SMMED) to efficiently detect an event in the presence of a large number of event classes. SMMED sequentially discards classes until only one class is identified as the detected class. This approach has two main benefits w.r.t. standard approaches: (1) It provides an efficient solution for early detection of events in the presence of large number of classes, and (2) it is computationally efficient because only a subset of likely classes are evaluated. The benefits of SMMED in comparison with existing approaches is illustrated in three databases using different modalities: MSRDaliy Activity (3D depth videos), UCF101 (RGB videos) and the CMU-Multi-Modal Action Detection (MAD) database (depth, RGB and skeleton). The CMU-MAD was recorded to target the problem of event detection (not classification), and the data and labels are available at http://humansensing.cs.cmu.edu/mad/ .

[1] Cristian Sminchisescu,et al. Conditional Random Fields for Contextual Human Motion Recognition , 2005, ICCV.

[2] Ying Wu,et al. Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Wei Niu,et al. Human activity detection and recognition for video surveillance , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[4] J.K. Aggarwal,et al. Human activity analysis , 2011, ACM Comput. Surv..

[5] James M. Rehg,et al. Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[6] Fernando De la Torre,et al. Max-Margin Early Event Detectors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Jason Weston,et al. Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[8] Jake K. Aggarwal,et al. Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[10] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11] S. Mitra,et al. Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12] Kent Larson,et al. Real-Time Recognition of Physical Activities and Their Intensities Using Wireless Accelerometers and a Heart Rate Monitor , 2007, 2007 11th IEEE International Symposium on Wearable Computers.

[13] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories , 2006 .

[14] Olivier Chapelle,et al. Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[15] Anthony Hoogs,et al. Learning and recognizing complex multi-agent activities with applications to american football plays , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[16] Martial Hebert,et al. Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17] Ramakant Nevatia,et al. Video-based event recognition: activity representation and probabilistic recognition methods , 2004, Comput. Vis. Image Underst..

[18] Matthew Brand,et al. Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19] Luc Van Gool,et al. Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Fernando De la Torre,et al. Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[21] Fernando De la Torre,et al. Action unit detection with segment-based SVMs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22] D.P. Siewiorek,et al. Wearable computers , 1994, IEEE Potentials.

[23] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[24] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..