Recognizing Actions in Motion Trajectories Using Deep Neural Networks

This paper reports on the progress of a co-creative pretend play agent designed to interact with users by recognizing and responding to playful actions in a 2D virtual environment. In particular, we describe the design and evaluation of a classifier that recognizes 2D motion trajectories from the user’s actions. The performance of the classifier is evaluated using a publicly available dataset of labeled actions highly relevant to the domain of pretend play. We show that deep convolutional neural networks perform significantly better in recognizing these actions than previously employed methods. We also describe the plan for implementing a virtual play environment using the classifier in which the users and agent can collaboratively construct narratives during improvisational pretend play.

[1]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  R. Caillois,et al.  Man, Play and Games , 1958 .

[3]  Subhashini Venugopalan,et al.  Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.

[4]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Alessia Saggese,et al.  Exploiting the deep learning paradigm for recognizing human actions , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[6]  F. Heider,et al.  An experimental study of apparent behavior , 1944 .

[7]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Guangchun Cheng,et al.  Advances in Human Action Recognition: A Survey , 2015, ArXiv.

[9]  Lei Chen,et al.  Deep Structured Models For Group Activity Recognition , 2015, BMVC.

[10]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[11]  Miguel A. Labrador,et al.  A Survey on Human Activity Recognition using Wearable Sensors , 2013, IEEE Communications Surveys & Tutorials.

[12]  Amit K. Roy-Chowdhury,et al.  Continuous Learning of Human Activity Models Using Deep Nets , 2014, ECCV.

[13]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[14]  Ling Shao,et al.  Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Louis-Philippe Morency,et al.  Recognizing Human Actions in the Motion Trajectories of Shapes , 2016, IUI.

[17]  Kris M. Kitani,et al.  Going Deeper into First-Person Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  H. D. Jaegher,et al.  Enactive intersubjectivity: Participatory sense-making and mutual incorporation , 2009 .

[20]  Brian Magerko,et al.  An Enactive Characterization of Pretend Play , 2015, Creativity & Cognition.

[21]  Zhi Liu,et al.  3D-based Deep Convolutional Neural Network for action recognition with depth sequences , 2016, Image Vis. Comput..

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Meng Wang,et al.  3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks , 2014, ACM Multimedia.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  E. D. Paolo,et al.  Participatory sense-making , 2007 .

[26]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[30]  Robert Bergevin,et al.  Semantic human activity recognition: A literature review , 2015, Pattern Recognit..

[31]  Paul J. M. Havinga,et al.  A Survey of Online Activity Recognition Using Mobile Phones , 2015, Sensors.

[32]  Xihong Wu,et al.  Recognizing Human Activities from Raw Accelerometer Data Using Deep Neural Networks , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[33]  Yu Qiao,et al.  Action Recognition with Stacked Fisher Vectors , 2014, ECCV.

[34]  J. Huizinga Homo Ludens: A Study of the Play-Element in Culture , 1938 .

[35]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[36]  Tao Xiang,et al.  Sketch-a-Net that Beats Humans , 2015, BMVC.

[37]  Bo Yu,et al.  Convolutional Neural Networks for human activity recognition using mobile sensors , 2014, 6th International Conference on Mobile Computing, Applications and Services.

[38]  T. Power Play and Exploration in Children and Animals , 1999 .