WiseNET: An indoor multi-camera multi-space dataset with contextual information and annotations for people detection and tracking

Nowadays, camera networks are part of our every-day life environments, consequently, they represent a massive source of information for monitoring human activities and to propose new services to the building users. To perform human activity monitoring, people must be detected and the analysis has to be done according to the information relative to the environment and the context. Available multi-camera datasets furnish videos with few (or none) information of the environment where the network was deployed. The proposed dataset provides multi-camera multi-space video sets along with the complete contextual information of the environment. The dataset regroups 11 video sets (composed of 62 single videos) recorded using 6 indoor cameras deployed on multiple spaces. The video sets represent more than 1 h of video footage, include 77 people tracks and captured different human actions such as walking around, standing/sitting, motionless, entering/leaving a space and group merging/splitting. Moreover, each video has been manually and automatically annotated to include people detection and tracking meta-information. The automatic people detection annotations were obtained by using different complexity and robustness detectors, from machine learning to state-of-art deep Convolutional Neural Network (CNN) models. Concerning the contextual information, the Industry Foundation Classes (IFC) file that represents the environment's Building Information Modeling (BIM) data is also provided. The BIM/IFC file describes the complete structure of the environment, it's topology and the elements contained in it. To our knowledge, the WiseNET dataset is the first to provide a set of videos along with the complete information of the environment. The WiseNET dataset is publicly available at https://doi.org/10.4121/uuid:c1fb5962-e939-4c51-bfd5-eac6f2935d44, as well as at the project's website http://wisenet.checksem.fr/#/dataset.

[1]  David L. Mills,et al.  Internet Engineering Task Force (ietf) Network Time Protocol Version 4: Protocol and Algorithms Specification , 2010 .

[2]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[4]  Amit K. Roy-Chowdhury,et al.  A Camera Network Tracking (CamNeT) Dataset and Performance Baseline , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[5]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[6]  Kaiqi Huang,et al.  An Equalized Global Graph Model-Based Approach for Multicamera Object Tracking , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[8]  Matej Kristan,et al.  Dana36: A Multi-camera Image Dataset for Object Identification in Surveillance Scenarios , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[9]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[10]  Kaiqi Huang,et al.  An equalised global graphical model-based approach for multi-camera object tracking , 2015, ArXiv.

[11]  Alexandre Bernardino,et al.  A multi-camera video dataset for research on high-definition surveillance , 2014 .

[12]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[13]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[14]  J. Crowley,et al.  CAVIAR Context Aware Vision using Image-based Active Recognition , 2005 .

[15]  Sridha Sridharan,et al.  A Database for Person Re-Identification in Multi-Camera Surveillance Networks , 2012, 2012 International Conference on Digital Image Computing Techniques and Applications (DICTA).

[16]  Christophe Nicolle,et al.  Ontology for a Panoptes building: Exploiting contextual information and a smart camera network , 2018, Semantic Web.

[17]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.