Deep understanding of shopper behaviours and interactions using RGB-D vision

In retail environments, understanding how shoppers move about in a store’s spaces and interact with products is very valuable. While the retail environment has several favourable characteristics that support computer vision, such as reasonable lighting, the large number and diversity of products sold, as well as the potential ambiguity of shoppers’ movements, mean that accurately measuring shopper behaviour is still challenging. Over the past years, machine-learning and feature-based tools for people counting as well as interactions analytic and re-identification were developed with the aim of learning shopper skills based on occlusion-free RGB-D cameras in a top-view configuration. However, after moving into the era of multimedia big data, machine-learning approaches evolved into deep learning approaches, which are a more powerful and efficient way of dealing with the complexities of human behaviour. In this paper, a novel VRAI deep learning application that uses three convolutional neural networks to count the number of people passing or stopping in the camera area, perform top-view re-identification and measure shopper–shelf interactions from a single RGB-D video flow with near real-time performances has been introduced. The framework is evaluated on the following three new datasets that are publicly available: TVHeads for people counting, HaDa for shopper–shelf interactions and TVPR2 for people re-identification. The experimental results show that the proposed methods significantly outperform all competitive state-of-the-art methods (accuracy of 99.5% on people counting, 92.6% on interaction classification and 74.5% on re-id), bringing to different and significative insights for implicit and extensive shopper behaviour analysis for marketing applications.

[1]  Emanuele Frontoni,et al.  Person Re-Identification with RGB-D Camera in Top-View Configuration through Multiple Nearest Neighbor Classifiers and Neighborhood Component Features Selection , 2018, Sensors.

[2]  Slawomir Bak,et al.  Multiple-shot human re-identification by Mean Riemannian Covariance Grid , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[3]  Mao Ye,et al.  Fast crowd density estimation with convolutional neural networks , 2015, Eng. Appl. Artif. Intell..

[4]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  W. G. Cochran The combination of estimates from different experiments. , 1954 .

[7]  Stephen D. Bay Nearest neighbor classification from multiple feature subsets , 1999, Intell. Data Anal..

[8]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[9]  Roberto Pierdicca,et al.  Robust and affordable retail customer profiling by vision and radio beacon sensor fusion , 2016, Pattern Recognit. Lett..

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  Milan Z. Bjelica,et al.  A human detection method for residential smart energy systems based on Zigbee RSSI changes , 2012, IEEE Transactions on Consumer Electronics.

[12]  Emanuele Frontoni,et al.  Modelling and Forecasting Customer Navigation in Intelligent Retail Environments , 2018, J. Intell. Robotic Syst..

[13]  Ramakant Nevatia,et al.  Tracking multiple humans in complex situations , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  Emanuele Frontoni,et al.  Shopper Analytics: A Customer Activity Recognition System Using a Distributed RGB-D Camera Network , 2014, VAAM@ICPR.

[16]  Michael Beetz,et al.  Gaussian process modeling of large-scale terrain , 2009 .

[17]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Pedro F. Felzenszwalb Learning models for object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  Emanuele Frontoni,et al.  People Counting in Crowded Environment and Re-identification , 2019, RGB-D Image Analysis and Processing.

[20]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[21]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[23]  Bernt Schiele,et al.  Recognizing Fine-Grained and Composite Activities Using Hand-Centric Features and Script Data , 2015, International Journal of Computer Vision.

[24]  Luc Van Gool,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Bill Page,et al.  Fundamental patterns of in-store shopper behavior , 2017 .

[26]  Jin Young Choi,et al.  Skeleton-Based Action Recognition of People Handling Objects , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[29]  K. Reynolds,et al.  Hedonic shopping motivations , 2003 .

[30]  Chaoqun Hong,et al.  Two-stream person re-identification with multi-task deep neural networks , 2018, Machine Vision and Applications.

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Mario Vento,et al.  Counting people by RGB or depth overhead cameras , 2016, Pattern Recognit. Lett..

[34]  Emanuele Frontoni,et al.  Visual and Textual Sentiment Analysis of Brand-Related Social Media Pictures Using Deep Convolutional Neural Networks , 2017, ICIAP.

[35]  Bo Hu,et al.  Multi-level feature fusion based Locality-Constrained Spatial Transformer network for video crowd counting , 2020, Neurocomputing.

[36]  Xiaochun Cao,et al.  Deep People Counting in Extremely Dense Crowds , 2015, ACM Multimedia.

[37]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[38]  Junjie Yan,et al.  Water Filling: Unsupervised People Counting via Vertical Kinect Sensor , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[39]  Hai Tao,et al.  Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features , 2008, ECCV.

[40]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[41]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  E. Rödel,et al.  Fisher, R. A.: Statistical Methods for Research Workers, 14. Aufl., Oliver & Boyd, Edinburgh, London 1970. XIII, 362 S., 12 Abb., 74 Tab., 40 s , 1971 .

[43]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Haizhou Ai,et al.  End-to-end crowd counting via joint learning local and global count , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[45]  Peter H. N. de With,et al.  Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment , 2012, IEEE Transactions on Consumer Electronics.

[46]  Yu Liu,et al.  A review of semantic segmentation using deep neural networks , 2017, International Journal of Multimedia Information Retrieval.

[47]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48]  Fabio Roli,et al.  Multimodal Person Reidentification Using RGB-D Cameras , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[49]  Vittorio Murino,et al.  Custom Pictorial Structures for Re-identification , 2011, BMVC.

[50]  Yang Li,et al.  Multi-shot Re-identification with Random-Projection-Based Random Forests , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[51]  Sung-Jea Ko,et al.  Robust people counting system based on sensor fusion , 2012, IEEE Transactions on Consumer Electronics.

[52]  Emanuele Frontoni,et al.  Convolutional Networks for Semantic Heads Segmentation using Top-View Depth Data in Crowded Environment , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[53]  Wei Lin,et al.  Learning From Synthetic Data for Crowd Counting in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  José Luis Lázaro,et al.  Directional People Counter Based on Head Tracking , 2013, IEEE Transactions on Industrial Electronics.

[55]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[56]  Thomas B. Moeslund,et al.  Multimodal Neural Network for Overhead Person Re-Identification , 2017, 2017 International Conference of the Biometrics Special Interest Group (BIOSIG).

[57]  Thomas B. Moeslund,et al.  Attention in Multimodal Neural Networks for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[58]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Emanuele Frontoni,et al.  Pervasive System for Consumer Behaviour Analysis in Retail Environments , 2016, VAAM/FFER@ICPR.

[60]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[61]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[62]  Lianwen Jin,et al.  High performance offline handwritten Chinese character recognition using GoogLeNet and directional feature maps , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[63]  Alberto Del Bimbo,et al.  Person Re-Identification by Iterative Re-Weighted Sparse Ranking , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[65]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[66]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Fabien Moutarde,et al.  Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[68]  Zhangyang Wang,et al.  In Defense of the Triplet Loss Again: Learning Robust Person Re-Identification with Fast Approximated Triplet Loss and Label Distillation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[69]  Ramakant Nevatia,et al.  Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors , 2007, International Journal of Computer Vision.

[70]  Sung Ho Cho,et al.  Bi-Directional Passing People Counting System Based on IR-UWB Radar Sensors , 2018, IEEE Internet of Things Journal.

[71]  Ramakant Nevatia,et al.  Segmentation and Tracking of Multiple Humans in Crowded Environments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Hai Tao,et al.  Evaluating Appearance Models for Recognition, Reacquisition, and Tracking , 2007 .

[73]  Hariharan Ravishankar,et al.  Learning and Incorporating Shape Models for Semantic Segmentation , 2017, MICCAI.

[74]  Massimo Piccardi,et al.  Height measurement as a session-based biometric for people matching across disjoint camera views , 2005 .

[75]  Emanuele Frontoni,et al.  People Detection and Tracking from an RGB-D Camera in Top-View Configuration: Review of Challenges and Applications , 2017, ICIAP Workshops.

[76]  Changyin Sun,et al.  Crowd Counting via Weighted VLAD on a Dense Attribute Feature Map , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[77]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Emanuele Frontoni,et al.  Mobile robot for retail surveying and inventory using visual and textual analysis of monocular pictures based on deep learning , 2017, 2017 European Conference on Mobile Robots (ECMR).

[80]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[81]  Shaogang Gong,et al.  Person Re-Identification by Support Vector Ranking , 2010, BMVC.

[82]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Ye Liu,et al.  Detecting and tracking people in real time with RGB-D camera , 2015, Pattern Recognit. Lett..

[84]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[85]  Li Fei-Fei,et al.  Recurrent Attention Models for Depth-Based Person Identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Emanuele Frontoni,et al.  Person Re-identification Dataset with RGB-D Camera in a Top-View Configuration , 2016, VAAM/FFER@ICPR.

[87]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[88]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  H. C. Phillips,et al.  Camera tracking: a new tool for market research and retail management , 1991 .

[90]  Thomas B. Moeslund,et al.  Person Re-Identification Using Spatial and Layer-Wise Attention , 2020, IEEE Transactions on Information Forensics and Security.

[91]  Nico Van de Weghe,et al.  Bluetooth tracking of humans in an indoor environment: An application to shopping mall visits , 2017 .