EPIC-Tent: An Egocentric Video Dataset for Camping Tent Assembly

This paper presents an outdoor video dataset annotated with action labels, collected from 24 participants wearing two head-mounted cameras (GoPro and SMI eye tracker) while assembling a camping tent. In total, this is 5.4 hours of recordings. Tent assembly includes manual interactions with non-rigid objects such as spreading the tent, securing guylines, reading instructions, and opening a tent bag. An interesting aspect of the dataset is that it reflects participants' proficiency in completing or understanding the task. This leads to participant differences in action sequences and action durations. Our dataset, called EPIC-Tent, also has several new types of annotations for two synchronised egocentric videos. These include task errors, self-rated uncertainty and gaze position, in addition to the task action labels. We present baseline results on the EPIC-Tent dataset using a state-of-the-art method for offline and online action recognition and detection.

[1]  Haroon Idrees,et al.  Predicting the Where and What of Actors and Actions through Online Action Localization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Susanne Westphal,et al.  The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Li Fei-Fei,et al.  Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Cordelia Schmid,et al.  Action Tubelet Detector for Spatio-Temporal Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Alexei A. Efros,et al.  KrishnaCam: Using a longitudinal, single-person, egocentric dataset for scene understanding tasks , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Richard Dewhurst,et al.  Using eye-tracking to trace a cognitive process: Gaze behavior during decision making in a natural environment , 2013 .

[8]  Ehud Rivlin,et al.  Online action recognition using covariance of shape and motion , 2014, Comput. Vis. Image Underst..

[9]  Andrea Cavallaro,et al.  A First-Person Vision Dataset of Office Activities , 2018, MPRSS.

[10]  Suman Saha,et al.  Online Real-Time Multiple Spatiotemporal Action Localisation and Prediction , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Amy R. Bland,et al.  Different Varieties of Uncertainty in Human Decision-Making , 2012, Front. Neurosci..

[12]  Shmuel Peleg,et al.  Compact CNN for indexing egocentric videos , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ivan Laptev,et al.  Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Adam Kepecs,et al.  A computational framework for the study of confidence in humans and animals , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[17]  A. Borghi,et al.  Object Affordances Tune Observers' Prior Expectations about Tool-Use Behaviors , 2012, PloS one.

[18]  Mario F. M. Campos,et al.  A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Petia Radeva,et al.  Egocentric video description based on temporally-linked sequences , 2018, J. Vis. Commun. Image Represent..

[20]  Larry H. Matthies,et al.  First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Petia Radeva,et al.  Ego-object discovery , 2015, ArXiv.

[22]  D. Ballard,et al.  Memory Representations in Natural Tasks , 1995, Journal of Cognitive Neuroscience.

[23]  Dima Damen,et al.  Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.

[24]  K. James,et al.  Functional fixedness in tool use: Learning modality, limitations and individual differences. , 2018, Acta Psychologica.

[25]  Wei-Shi Zheng,et al.  Deep Dual Relation Modeling for Egocentric Interaction Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yoichi Sato,et al.  Recognizing Micro-Actions and Reactions from Paired Egocentric Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Hanqing Lu,et al.  EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition , 2018, IEEE Transactions on Multimedia.

[29]  Michael J. Spivey,et al.  Action Dynamics Reveal Parallel Competition in Decision Making , 2008, Psychological science.

[30]  Petia Radeva,et al.  Towards social pattern characterization in egocentric photo-streams , 2017, Comput. Vis. Image Underst..

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Thomas Brox,et al.  ECO: Efficient Convolutional Network for Online Video Understanding , 2018, ECCV.

[33]  Dima Damen,et al.  You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video , 2014, BMVC.

[34]  C. V. Jawahar,et al.  Trajectory aligned features for first person action recognition , 2016, Pattern Recognit..

[35]  Xiaofeng Ren,et al.  Figure-ground segmentation improves handled object recognition in egocentric video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Mary M Hayhoe,et al.  Control of gaze in natural environments: effects of rewards and costs, uncertainty and memory in target selection , 2018, Interface Focus.

[37]  Petia Radeva,et al.  Simultaneous food localization and recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[38]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[39]  Bo Hu,et al.  Discriminative Action States Discovery for Online Action Recognition , 2016, IEEE Signal Processing Letters.

[40]  James M. Rehg,et al.  In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video , 2018, ECCV.

[41]  Luc Van Gool,et al.  Spatio-Temporal Channel Correlation Networks for Action Classification , 2018, ECCV.

[42]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Thomas Brox,et al.  Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  D. Ballard,et al.  The role of uncertainty and reward on eye movements in a virtual driving task. , 2012, Journal of vision.

[45]  Christian Theobalt,et al.  Real-Time Hand Tracking Under Occlusion from an Egocentric RGB-D Sensor , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[46]  Petia Radeva,et al.  SR-clustering: Semantic regularized clustering for egocentric photo streams segmentation , 2015, Comput. Vis. Image Underst..

[47]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Shanxin Yuan,et al.  First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Alexandre Pouget,et al.  Confidence and certainty: distinct probabilistic quantities for different goals , 2016, Nature Neuroscience.

[50]  Yansong Tang,et al.  Action recognition in RGB-D egocentric videos , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[51]  Stefan Lee,et al.  Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Bowen Zhang,et al.  Real-Time Action Recognition with Enhanced Motion Vector CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).