Unsupervised random forest indexing for fast action search

Despite recent successes of searching small object in images, it remains a challenging problem to search and locate actions in crowded videos because of (1) the large variations of human actions and (2) the intensive computational cost of searching the video space. To address these challenges, we propose a fast action search and localization method that supports relevance feedback from the user. By characterizing videos as spatio-temporal interest points and building a random forest to index and match these points, our query matching is robust and efficient. To enable efficient action localization, we propose a coarse-to-fine sub-volume search scheme, which is several orders faster than the existing video branch and bound search. The challenging cross-dataset search of several actions validates the effectiveness and efficiency of our method.

[1]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Richard P. Wildes,et al.  Efficient action spotting based on a spacetime oriented structure representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Luc Van Gool,et al.  A Hough transform-based voting framework for action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[7]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Christophe Marsala,et al.  High scale video mining with forests of fuzzy decision trees , 2008, CSTST.

[9]  Ying Wu,et al.  Speeding up spatio-temporal sliding-window search for efficient event detection in crowded videos , 2009, EiMM '09.

[10]  Christoph H. Lampert Detecting objects in large image collections and videos by efficient subimage retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Jiebo Luo,et al.  Utilizing semantic word similarity measures for video retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[14]  Krystian Mikolajczyk,et al.  Action recognition with motion-appearance vocabulary forest , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  James M. Rehg,et al.  Quasi-periodic event analysis for social game retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Cordelia Schmid,et al.  Mining Visual Actions from Movies , 2009, BMVC.

[20]  Gang Yu,et al.  Fast Action Detection via Discriminative Random Forest Voting and Top-K Subvolume Search , 2011, IEEE Transactions on Multimedia.

[21]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Junsong Yuan,et al.  Efficient search of Top-K video subvolumes for multi-instance action detection , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[23]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Nazli Ikizler-Cinbis,et al.  Learning actions from the Web , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Mubarak Shah,et al.  Incremental action recognition using feature-tree , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[27]  Yu Qian,et al.  Storyboard sketches for Content Based Video Retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.