First-Person Animal Activity Recognition from Egocentric Videos

This paper introduces the concept of first-person animal activity recognition, the problem of recognizing activities from a view-point of an animal (e.g., a dog). Similar to first-person activity recognition scenarios where humans wear cameras, our approach estimates activities performed by an animal wearing a camera. This enables monitoring and understanding of natural animal behaviors even when there are no people around them. Its applications include automated logging of animal behaviors for medical/biology experiments, monitoring of pets, and investigation of wildlife patterns. In this paper, we construct a new dataset composed of first-person animal videos obtained by mounting a camera on each of the four pet dogs. Our new dataset consists of 10 activities containing a heavy/fair amount of ego-motion. We implemented multiple baseline approaches to recognize activities from such videos while utilizing multiple types of global/local motion features. Animal ego-actions as well as human-animal interactions are recognized with the baseline approaches, and we discuss experimental results.

[1]  Matti Pietikäinen,et al.  Rotation-Invariant Image and Video Description With Local Binary Pattern Features , 2012, IEEE Transactions on Image Processing.

[2]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[3]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[4]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[5]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[6]  Larry H. Matthies,et al.  First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[8]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[12]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.