Mining representative actions for actor identification

Previous works on actor identification mainly focused on static features based on face identification and costume detection, without considering the abundant dynamic information contained in videos. In this paper, we propose a novel method to mine representative actions of each actor, and show the remarkable power of such actions for actor identification task. Videos are firstly divided into shots and represented by BoW based on spatial-temporal features. Then we integrate the prototype theory with SVM to rank the shots and obtain the representative actions. Our method for actor identification combines representative actions with actors' appearance. We validate the method on episodes of the TV series "The Big Bang Theory". The experimental results show that the representative actions are consistent with human judgements and can greatly improve the matching performance as complementary to existing handcrafted static features for actor identification.

[1]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[2]  Hongxun Yao,et al.  Strategy for dynamic 3D depth data matching towards robust action retrieval , 2015, Neurocomputing.

[3]  Rémi Ronfard,et al.  Detecting and Naming Actors in Movies Using Generative Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Mubarak Shah,et al.  Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Per-Erik Forssén,et al.  Maximally Stable Colour Regions for Recognition and Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Sham M. Kakade,et al.  Leveraging archival video for building face datasets , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Andrew Zisserman,et al.  “Who are you?” - Learning person specific classifiers from video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Larry S. Davis,et al.  Representing Videos Using Mid-level Discriminative Patches , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[10]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  E. Rosch Cognitive Representations of Semantic Categories. , 1975 .

[12]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[13]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[14]  Xiaogang Wang,et al.  Learning Mid-level Filters for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Cordelia Schmid,et al.  Finding Actors and Actions in Movies , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Hongxun Yao,et al.  Exploring Implicit Image Statistics for Visual Representativeness Modeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.