Video event detection using generalized subclass discriminant analysis and linear support vector machines

In this paper, a two-phase approach to event detection in video is proposed. This combines a novel nonlinear Discriminant Analysis (DA) method called Generalized Subclass DA (GSDA), to identify a discriminant subspace, and a Linear Support Vector Machine (LSVM), to efficiently learn the event in the derived subspace. The proposed GSDA-LSVM framework is used as an alternative to the Kernel Support Vector Machine (KSVM) approach, which despite its excellent classification accuracy requires significant computational resources for learning the events (i.e., for identifying the kernel parameters and KSVM penalty term) in large-scale video collections. In contrary, using the GSDA-LSVM approach the SVM penalty term can be rapidly identified in the lower dimensional subspace. Moreover, an additional speed up in deriving this lower-dimensional space is achieved by using the proposed GSDA method instead of conventional nonlinear subclass DA methods such as KSDA or KMSDA. This is made possible by GSDA exploiting the special structure of the inter-between-subclass scatter matrix to reformulate the original KSDA eigenvalue problem to one involving matrices of much smaller dimension. The proposed GSDA-LSVM approach leads to more accurate event detection and to computational efficiency gains, as shown by experimental results on the extensive TRECVID MED 2010 and 2012 datasets.

[1]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[2]  Yiannis Kompatsiaris,et al.  Mixture Subclass Discriminant Analysis Link to Restricted Gaussian Model and Other Generalizations , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Jun Yu,et al.  On Combining Multiple Features for Cartoon Character Retrieval and Clip Synthesis , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[5]  Jiawei Han,et al.  Speed up kernel discriminant analysis , 2011, The VLDB Journal.

[6]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Aleix M. Martínez,et al.  Pruning Noisy Bases in Discriminant Analysis , 2008, IEEE Transactions on Neural Networks.

[8]  Nicu Sebe,et al.  Classifier-specific intermediate representation for multimedia tasks , 2012, ICMR '12.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Bernhard Schölkopf,et al.  Support Vector Machines as Probabilistic Models , 2011, ICML.

[11]  Mubarak Shah,et al.  High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[12]  Aleix M. Martínez,et al.  Kernel Optimization in Discriminant Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ramakant Nevatia,et al.  Large-scale web video event classification by use of Fisher Vectors , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[14]  Ramakant Nevatia,et al.  Evaluating multimedia features and fusion for example-based event detection , 2013, Machine Vision and Applications.

[15]  Yiannis Kompatsiaris,et al.  Video event detection using a subclass recoding error-correcting output codes framework , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[16]  Jian Yang,et al.  Sparse Representation Classifier Steered Discriminative Projection With Applications to Face Recognition , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[17]  G. Stewart,et al.  An Algorithm for Generalized Matrix Eigenvalue Problems. , 1973 .

[18]  K. Nelson,et al.  Event knowledge : structure and function in development , 1986 .

[19]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[21]  Jieping Ye,et al.  Efficient Kernel Discriminant Analysis via QR Decomposition , 2004, NIPS.

[22]  Yiannis Kompatsiaris,et al.  High-level event detection in video exploiting discriminant concepts , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[23]  Stephen E. Robertson,et al.  A new interpretation of average precision , 2008, SIGIR '08.

[24]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[25]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[26]  G. Stewart Matrix Algorithms, Volume II: Eigensystems , 2001 .

[27]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[28]  Zheng Bao,et al.  Kernel subclass discriminant analysis , 2007, Neurocomputing.

[29]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[30]  Anastasios Tefas,et al.  Minimum Class Variance Support Vector Machines , 2007, IEEE Transactions on Image Processing.

[31]  Brian Antonishek TRECVID 2010 – An Introduction to the Goals , Tasks , Data , Evaluation Mechanisms , and Metrics , 2010 .

[32]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Masoud Mazloom,et al.  Searching informative concept banks for video event detection , 2013, ICMR.

[34]  Florian Metze,et al.  CMU-Informedia @ TRECVID 2013 Multimedia Event Detection , 2013 .

[35]  Koen E. A. van de Sande,et al.  Recommendations for video event recognition using concept vocabularies , 2013, ICMR.

[36]  Shuang Wu,et al.  Multimodal feature fusion for robust event detection in web videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Jun Yu,et al.  Complex Object Correspondence Construction in Two-Dimensional Animation , 2011, IEEE Transactions on Image Processing.

[38]  Cordelia Schmid,et al.  The AXES submissions at TRECVID 2013 , 2013, TRECVID.