Content-based retrieval of human actions from realistic video databases

Due to the increasing amount of video data available in various databases, on the Internet and elsewhere, new methods of managing these data are required, leading to the development of content-based video retrieval systems. We explore several recently developed action representation and information retrieval techniques in a human action retrieval system. These techniques include various means of local feature extraction; soft-assignment clustering; Bag-of-Words, vocabulary guided and spatio-temporal pyramid matches for action representation; SVMs and ABRS-SVMs for relevance feedback. Successful application of relevance feedback in particular will result in far more practical systems. We evaluate the performance of several combinations of the above techniques in three realistic action datasets: UCF Sports, UCF YouTube and HOHA2.

[1]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[2]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[3]  Hanzi Wang,et al.  Human Action Recognition Using Pyramid Vocabulary Tree , 2009, ACCV.

[4]  Remi Depommier,et al.  Content-based browsing of video sequences , 1994, MULTIMEDIA '94.

[5]  Ling Shao,et al.  Histogram of Body Poses and Spectral Regression Discriminant Analysis for Human Action Categorization , 2010, BMVC.

[6]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[7]  Meng Wang,et al.  VisionGo: Towards video retrieval with joint exploration of human and computer , 2011, Inf. Sci..

[8]  Rongrong Ji,et al.  Random Sampling SVM Based Soft Query Expansion for Image Retrieval , 2007, Fourth International Conference on Image and Graphics (ICIG 2007).

[9]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[10]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ling Shao,et al.  Relevance feedback for real-world human action retrieval , 2012, Pattern Recognit. Lett..

[12]  Won Jong Jeon,et al.  Spatio-temporal pyramid matching for sports videos , 2008, MIR '08.

[13]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Dacheng Tao,et al.  Sparse transfer learning for interactive video search reranking , 2012, TOMCCAP.

[15]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[16]  Xuelong Li,et al.  Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm , 2006, IEEE Transactions on Multimedia.

[17]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Trevor Darrell,et al.  Approximate Correspondences in High Dimensions , 2006, NIPS.

[19]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Yue Gao,et al.  k-Partite graph reinforcement and its application in multimedia information retrieval , 2012, Inf. Sci..

[21]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[22]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ling Shao,et al.  Retrieving Human Actions Using Spatio-Temporal Features and Relevance Feedback , 2010 .

[24]  Xian-Sheng Hua,et al.  Active Reranking for Web Image Search , 2010, IEEE Transactions on Image Processing.

[25]  Jake K. Aggarwal,et al.  An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010 , 2010, ICPR Contests.

[26]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[27]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[29]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Qi Tian,et al.  Incorporate support vector machines to content-based image retrieval with relevance feedback , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[31]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  David Elliott,et al.  In the Wild , 2010 .

[33]  Xuelong Li,et al.  Negative Samples Analysis in Relevance Feedback , 2007, IEEE Transactions on Knowledge and Data Engineering.

[34]  Ling Shao,et al.  Feature detector and descriptor evaluation in human action recognition , 2010, CIVR '10.

[35]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[36]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[37]  Yi Yang,et al.  Recognizing Cartoon Image Gestures for Retrieval and Interactive Cartoon Clip Synthesis , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Rong Yan,et al.  Negative pseudo-relevance feedback in content-based video retrieval , 2003, MULTIMEDIA '03.

[39]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[40]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[41]  Ling Shao,et al.  Spatio-temporal shape contexts for human action retrieval , 2009, IMCE '09.

[42]  Ricardo da Silva Torres,et al.  Exploiting pairwise recommendation and clustering strategies for image re-ranking , 2012, Inf. Sci..

[43]  Ling Shao,et al.  Action retrieval with relevance feedback on YouTube videos , 2011, ICIMCS '11.