Multi-Max-Margin Support Vector Machine for multi-source human action recognition

We propose a new ensemble-based classifier for multi-source human action recognition called Multi-Max-Margin Support Vector Machine (MMM-SVM). This ensemble method incorporates the decision values of multiple sources and makes an informed final prediction by merging multi-source feature's intrinsic decision strength. Experiments performed on the benchmark IXMAS multi-view dataset (Weinland [1]) demonstrate that the performance of our multi-view system can further improve the accuracy over single view by 3-13% and consistently outperform the direct-concatenation method. We further apply this ensemble technique for combining the decision values of contextual and motion information in the UCF Sports dataset (Liu, 2009 [2]) and the results are comparable to the state-of-the-art, which exhibits our algorithm's potential for further extension in other areas of feature fusion problems.

[1]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[5]  Ling Shao,et al.  Histogram of Body Poses and Spectral Regression Discriminant Analysis for Human Action Categorization , 2010, BMVC.

[6]  Hui Cheng,et al.  Evaluation of low-level features and their combinations for complex event detection in open source videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[8]  Shiliang Sun,et al.  A review of optimization methodologies in support vector machines , 2011, Neurocomputing.

[9]  Matthew B. Blaschko,et al.  Learning equivariant structured output SVM regressors , 2011, 2011 International Conference on Computer Vision.

[10]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Hongbin Wang,et al.  EigenBody: Analysis of body shape for gender from noisy images , 2010 .

[12]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[13]  Pascal Fua,et al.  Making Action Recognition Robust to Occlusions and Viewpoint Changes , 2010, ECCV.

[14]  Ling Shao,et al.  Action recognition using Correlogram of Body Poses and spectral regression , 2011, 2011 18th IEEE International Conference on Image Processing.

[15]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[16]  Pinar Duygulu Sahin,et al.  A new pose-based representation for recognizing actions from multiple cameras , 2011, Comput. Vis. Image Underst..

[17]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[18]  Ling Shao,et al.  Silhouette Analysis-Based Action Recognition Via Exploiting Human Poses , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Rohini K. Srihari,et al.  Incorporating prior knowledge with weighted margin support vector machines , 2004, KDD.

[20]  Xinghua Sun,et al.  Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[22]  Wei Liu,et al.  Double Fusion for Multimedia Event Detection , 2012, MMM.

[23]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[25]  Shuang Wu,et al.  Multimodal feature fusion for robust event detection in web videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Huiyu Zhou,et al.  Age classification using Radon transform and entropy based scaling SVM , 2011, BMVC.

[27]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[29]  Xuelong Li,et al.  Multitraining Support Vector Machine for Image Retrieval , 2006, IEEE Transactions on Image Processing.

[30]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[31]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[32]  Terrance E. Boult,et al.  Multi-attribute spaces: Calibration for attribute fusion and similarity search , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.