Rapid Localisation and Retrieval of Human Actions with Relevance Feedback

As increasing levels of multimedia data online require more sophisticated methods to organise this data, we present a practical system for performing rapid localisation and retrieval of human actions from large video databases. We first temporally segment the database and calculate a histogram-match score for each segment against the query. High-scoring, adjacent segments are joined into candidate localised regions using a noise-robust localisation algorithm, and each candidate region is then ranked against the query. Experiments show that this method surpasses the efficiency of previous attempts to perform similar action searches with localisation. We demonstrate how results can be enhanced using relevance feedback, considering how relevance feedback can be effectively applied in the context of localisation.

[1]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[2]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[4]  Fei Wang,et al.  Interactive localized content based image retrieval with multiple-instance active learning , 2010, Pattern Recognit..

[5]  Ling Shao,et al.  Spatio-temporal shape contexts for human action retrieval , 2009, IMCE '09.

[6]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[8]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Ling Shao,et al.  Feature detector and descriptor evaluation in human action recognition , 2010, CIVR '10.

[10]  Stefan Carlsson,et al.  Recognizing and Tracking Human Action , 2002, ECCV.

[11]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[12]  Ling Shao,et al.  Relevance feedback for real-world human action retrieval , 2012, Pattern Recognit. Lett..

[13]  Won Jong Jeon,et al.  Spatio-temporal pyramid matching for sports videos , 2008, MIR '08.

[14]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[16]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[18]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[19]  Cordelia Schmid,et al.  Human Focused Action Localization in Video , 2010, ECCV Workshops.

[20]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[21]  Gang Yu,et al.  Unsupervised random forest indexing for fast action search , 2011, CVPR 2011.

[22]  Hui Zhang,et al.  Localized Content-Based Image Retrieval , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.