EGO-CH: Dataset and Fundamental Tasks for Visitors BehavioralUnderstanding using Egocentric Vision

Abstract Equipping visitors of a cultural site with a wearable device allows to easily collect information about their preferences which can be exploited to improve the fruition of cultural goods with augmented reality. Moreover, egocentric video can be processed using computer vision and machine learning to enable an automated analysis of visitors’ behavior. The inferred information can be used both online to assist the visitor and offline to support the manager of the site. Despite the positive impact such technologies can have in cultural heritage, the topic is currently understudied due to the limited number of public datasets suitable to study the considered problems. To address this issue, in this paper we propose EGOcentric-Cultural Heritage (EGO-CH), the first dataset of egocentric videos for visitors’ behavior understanding in cultural sites. The dataset has been collected in two cultural sites and includes more than 27hours of video acquired by 70 subjects, with labels for 26 environments and over 200 different Points of Interest. A large subset of the dataset, consisting of 60 videos, is associated with surveys filled out by real visitors. To encourage research on the topic, we propose 4 challenging tasks (room-based localization, point of interest/object recognition, object retrieval and survey prediction) useful to understand visitors’ behavior and report baseline results on the dataset.

[1]  Alberto Del Bimbo,et al.  Visions for Augmented Cultural Heritage Experience , 2014, IEEE MultiMedia.

[2]  Alberto Del Bimbo,et al.  Real-time Wearable Computer Vision System for Improved Museum Experience , 2016, ACM Multimedia.

[3]  Andrew D. Bagdanov,et al.  NoisyArt: A Dataset for Webly-supervised Artwork Recognition , 2019, VISIGRAPP.

[4]  Giovanni Maria Farinella,et al.  Localization of Visitors for Cultural Sites Management. , 2018 .

[5]  Keisuke Kameyama,et al.  Content-Based Image Retrieval of Cultural Heritage Symbols by Interaction of Visual Perspectives , 2011, Int. J. Pattern Recognit. Artif. Intell..

[6]  Rui Zhang,et al.  Museum Exhibit Identification Challenge for Domain Adaptation and Beyond , 2018, ArXiv.

[7]  Stefan Carlsson,et al.  Estimating Attention in Exhibitions Using Wearable Cameras , 2014, 2014 22nd International Conference on Pattern Recognition.

[8]  Nikolaos M. Avouris,et al.  Context-based design of mobile applications for museums: a survey of existing practices , 2005, Mobile HCI.

[9]  Giovanni Maria Farinella,et al.  Egocentric Point of Interest Recognition in Cultural Sites , 2019, VISIGRAPP.

[10]  Zainal A. Hasibuan,et al.  Management and retrieval of cultural heritage multimedia collection using ontology , 2014, 2014 The 1st International Conference on Information Technology, Computer, and Electrical Engineering.

[11]  Giovanni Maria Farinella,et al.  Personal-location-based temporal segmentation of egocentric videos for lifelogging applications , 2018, J. Vis. Commun. Image Represent..

[12]  Marinos Ioannides,et al.  In the wild image retrieval and clustering for 3D cultural heritage landmarks reconstruction , 2014, Multimedia Tools and Applications.

[13]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[14]  Alberto Del Bimbo,et al.  Deep Artwork Detection and Retrieval for Automatic Context-Aware Audio Guides , 2017, ACM Trans. Multim. Comput. Commun. Appl..

[15]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Giovanni Maria Farinella,et al.  Egocentric Visitors Localization in Cultural Sites , 2019, ACM Journal on Computing and Cultural Heritage.

[17]  Alessandro Bollo,et al.  Analysis of Visitor Behaviour inside the Museum : An Empirical Study , 2005 .

[18]  Haralampos Karanikas,et al.  Pattern-Based Retrieval of Cultural Heritage Images , 2007 .

[19]  Hironobu Takagi,et al.  NavCog: turn-by-turn smartphone navigation assistant for people with visual impairments or blindness , 2016, W4A.