The Visual Social Distancing Problem

One of the main and most effective measures to contain the recent viral outbreak is the maintenance of the so-called Social Distancing (SD). To comply with this constraint, governments are adopting restrictions over the minimum inter-personal distance between people. Given this actual scenario, it is crucial to massively measure the compliance to such physical constraint in our life, in order to figure out the reasons of the possible breaks of such distance limitations, and understand if this implies a potential threat. To this end, we introduce the Visual Social Distancing (VSD) problem, defined as the automatic estimation of the inter-personal distance from an image, and the characterization of related people aggregations. VSD is pivotal for a non-invasive analysis to whether people comply with the SD restriction, and to provide statistics about the level of safety of specific areas whenever this constraint is violated. We first point out that measuring VSD is not only a geometrical problem, but it also implies a deeper understanding of the social behaviour in the scene. The aim is to truly detect potentially dangerous situations while avoiding false alarms (e.g., a family with children or relatives, an elder with their caregivers), all of this by complying with current privacy policies. We then discuss how VSD relates with previous literature in Social Signal Processing and indicate a path to research new Computer Vision methods that can possibly provide a solution to such problem. We conclude with future challenges related to the effectiveness of VSD systems, ethical implications and future application scenarios.

[1]  Pascal Fua,et al.  Estimating People Flows to Better Count them in Crowded Scenes , 2020, ECCV.

[2]  E. Goffman Behavior in public places : notes on the social organization of gatherings , 1964 .

[3]  Yuning Jiang,et al.  Repulsion Loss: Detecting Pedestrians in a Crowd , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Francesco Setti,et al.  Multi-scale f-formation discovery for group detection , 2013, 2013 IEEE International Conference on Image Processing.

[5]  Alex Rutherford,et al.  On the privacy-conscientious use of mobile phone data , 2018, Scientific Data.

[6]  Reinhard Koch,et al.  Vanishing Point Estimation and Line Classification in a Manhattan World with a Unifying Camera Model , 2016, International Journal of Computer Vision.

[7]  Allan Hanbury,et al.  Robust camera self-calibration from monocular images of Manhattan worlds , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Cordelia Schmid,et al.  LCR-Net++: Multi-Person 2D and 3D Pose Detection in Natural Images , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Angélica Muñoz-Meléndez,et al.  An entropy model to measure heterogeneity of pedestrian crowds using self-propelled agents , 2017 .

[10]  D. Wolberg Human Evolution , 1927, Nature.

[11]  A. Hare,et al.  Group Size , 1981 .

[12]  Ahmed Nabil Belbachir,et al.  Smart Cameras , 2014 .

[13]  Silvio Savarese,et al.  DANTE: Deep Affinity Network for Clustering Conversational Interactants , 2019, ArXiv.

[14]  Zequn Jie,et al.  PS-RCNN: Detecting Secondary Human Instances in a Crowd via Primary Object Suppression , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[15]  Kris M. Kitani,et al.  MGpi: A Computational Model of Multiagent Group Perception and Interaction , 2019, AAMAS.

[16]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  A. Kendon Conducting Interaction: Patterns of Behavior in Focused Encounters , 1990 .

[18]  Gabriel J. Brostow,et al.  Footprints and Free Space From a Single Color Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jake K. Aggarwal,et al.  Determining vanishing points from perspective images , 1984, Comput. Vis. Graph. Image Process..

[20]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[21]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Robert Wagner,et al.  4-D Scene Alignment in Surveillance Video , 2019, 2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[23]  Kris Kitani,et al.  GroundNet: Monocular Ground Plane Normal Estimation with Geometric Consistency , 2018, ACM Multimedia.

[24]  Arne Schumann,et al.  Human Pose Estimation for Real-World Crowded Scenarios , 2019, 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[25]  Luca Iocchi,et al.  Online real-time crowd behavior detection in video sequences , 2016, Comput. Vis. Image Underst..

[26]  Nicu Sebe,et al.  The S-HOCK dataset: Analyzing crowds at the stadium , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Marcello Pelillo,et al.  Chapter 12 - Detecting conversational groups in images using clustering games , 2019 .

[28]  Ting Yu,et al.  Unified Crowd Segmentation , 2008, ECCV.

[29]  Vishal M. Patel,et al.  CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[30]  Ramakant Nevatia,et al.  Camera calibration from video of a walking human , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Stergios I. Roumeliotis,et al.  Optimal estimation of vanishing points in a Manhattan world , 2011, 2011 International Conference on Computer Vision.

[32]  Dana Kulic,et al.  Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots , 2009, Int. J. Soc. Robotics.

[33]  Pascal Fua,et al.  What Face and Body Shapes Can Tell Us About Height , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[34]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Ben J. A. Kröse,et al.  Detecting F-formations as dominant sets , 2011, ICMI '11.

[36]  Harry Francis Mallgrave,et al.  Architecture and Embodiment: The Implications of the New Sciences and Humanities for Design , 2013 .

[37]  Francesco Setti,et al.  Evaluating the Group Detection Performance: The GRODE Metrics , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Tony Belpaeme,et al.  Nonverbal Immediacy as a Characterisation of Social Behaviour for Human–Robot Interaction , 2016, International Journal of Social Robotics.

[39]  E. Hall,et al.  The Hidden Dimension , 1970 .

[40]  Pascal Vasseur,et al.  Globally optimal line clustering and vanishing point estimation in Manhattan world , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Hongbin Zha,et al.  Vanishing point detection using cascaded 1D Hough Transform from single images , 2012, Pattern Recognit. Lett..

[42]  Lilly Suriani Affendey,et al.  Collective Interaction Filtering Approach for Detection of Group in Diverse Crowded Scenes , 2019, KSII Trans. Internet Inf. Syst..

[43]  Vishal M. Patel,et al.  A Survey of Recent Advances in CNN-based Single Image Crowd Counting and Density Estimation , 2017, Pattern Recognit. Lett..

[44]  Xiaogang Wang,et al.  Fully Convolutional Neural Networks for Crowd Segmentation , 2014, ArXiv.

[45]  Junping Zhang,et al.  PaDNet: Pan-Density Crowd Counting , 2018, IEEE Transactions on Image Processing.

[46]  Emanuel Aldea,et al.  Evidential query-by-committee active learning for pedestrian detection in high-density crowds , 2019, Int. J. Approx. Reason..

[47]  E. Goffman Encounters; Two Studies in the Sociology of Interaction , 1962 .

[48]  Kerstin Dautenhahn,et al.  Social Roles and Baseline Proxemic Preferences for a Domestic Service Robot , 2014, Int. J. Soc. Robotics.

[49]  Ian D. Reid,et al.  Camera calibration from human motion , 2008, Image Vis. Comput..

[50]  Francesco Setti,et al.  Count on Me: Learning to Count on a Single Image , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[51]  Yanxi Liu,et al.  Surveillance Camera Autocalibration based on Pedestrian Height Distributions , 2011 .

[52]  Nicu Sebe,et al.  Real Time Detection of Social Interactions in Surveillance Video , 2012, ECCV Workshops.

[53]  Andrew Zisserman,et al.  A Geometric Approach to Obtain a Bird's Eye View From an Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[54]  Marc Pollefeys,et al.  3-line RANSAC for orthogonal vanishing point detection , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[55]  Lee C. Bollinger,et al.  Why Diversity Matters. , 2007 .

[56]  Vittorio Murino,et al.  Social interactions by visual focus of attention in a three‐dimensional environment , 2013, Expert Syst. J. Knowl. Eng..

[57]  Wei-Shi Zheng,et al.  Does A Body Image Tell Age? , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[58]  Yunhong Wang,et al.  Adaptive NMS: Refining Pedestrian Detection in a Crowd , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  John Peruzzi,et al.  Conversational Group Detection With Deep Convolutional Networks , 2018, ArXiv.

[60]  Pascal Fua,et al.  XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera , 2019, ACM Trans. Graph..

[61]  P. Shalit The Silent Language , 1964 .

[62]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[63]  Yannick Hold-Geoffroy,et al.  A Perceptual Measure for Deep Single Image Camera Calibration , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  T. M. Ciolek,et al.  Environment and the Spatial Arrangement of Conversational Encounters , 1980 .

[65]  Robert T. Collins,et al.  Vision-Based Analysis of Small Groups in Pedestrian Crowds , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Yichen Wei,et al.  Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[67]  Pascal Fua,et al.  XNect , 2019, ACM Trans. Graph..

[68]  Daniel Cremers,et al.  CVPR19 Tracking and Detection Challenge: How crowded can it get? , 2019, ArXiv.

[69]  Carsten Rother,et al.  A New Approach for Vanishing Point Detection in Architectural Environments , 2000, BMVC.

[70]  Alessio Del Bue,et al.  Social interaction discovery by statistical analysis of F-formations , 2011, BMVC.

[71]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[72]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[73]  Hao Zhu,et al.  CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[75]  Francesco Setti,et al.  Viewing the Viewers: A Novel Challenge for Automated Crowd Analysis , 2013, ICIAP Workshops.

[76]  Leonidas J. Guibas,et al.  Counting people in crowds with a real-time network of simple image sensors , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[77]  Francesco Setti,et al.  Groups and Crowds: Behaviour Analysis of People Aggregations , 2016, VISIGRAPP.

[78]  Francesco Solera,et al.  Socially Constrained Structural Learning for Groups Detection in Crowd , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Noel E. O'Connor,et al.  Abnormal crowd behavior detection using novel optical flow-based features , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[80]  Michael Brady,et al.  Ground plane estimation, error analysis and applications , 2002, Robotics Auton. Syst..

[81]  Yaser Yacoob,et al.  Statistical body height estimation from a single image , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[82]  Yubin Kuang,et al.  Deep Single Image Camera Calibration With Radial Distortion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Francesc Moreno-Noguer,et al.  3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Francesco Setti,et al.  Group detection in still images by F-formation modeling: A comparative study , 2013, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS).

[85]  Francesco Setti,et al.  F-Formation Detection: Individuating Free-Standing Conversational Groups in Images , 2015, PloS one.

[86]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[87]  Jian Yao,et al.  2-Line Exhaustive Searching for Real-Time Vanishing Point Estimation in Manhattan World , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[88]  Andrea Cavallaro,et al.  Detection and tracking of groups in crowd , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[89]  Guoping Qiu,et al.  Crowd density estimation based on rich features and random projection forest , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[90]  A. J. Wootton,et al.  Erving Goffman: Exploring the interaction order. , 1992 .

[91]  Lily Lee,et al.  Monitoring Activities from Multiple Video Streams: Establishing a Common Coordinate Frame , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[92]  Johan Vester Estimating the Height of an Unknown Object in a 2D Image. , 2012 .

[93]  Wen Gao,et al.  Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[94]  Jenq-Neng Hwang,et al.  ESTHER: Joint Camera Self-Calibration and Automatic Radial Distortion Correction From Tracking of Walking Humans , 2019, IEEE Access.

[95]  J. Blascovich,et al.  Proxemic behaviors as predictors of aggression towards Black (but not White) males in an immersive virtual environment , 2009 .

[96]  Georg Groh,et al.  Detecting Social Situations from Interaction Geometry , 2010, 2010 IEEE Second International Conference on Social Computing.

[97]  Nathan Eagle,et al.  Reality Mining: Using Big Data to Engineer a Better World , 2014 .

[98]  Srinivas S. Kruthiventi,et al.  CrowdNet: A Deep Convolutional Network for Dense Crowd Counting , 2016, ACM Multimedia.

[99]  Marcello Pelillo,et al.  Detecting conversational groups in images and sequences: A robust game-theoretic approach , 2016, Comput. Vis. Image Underst..

[100]  Xiaochun Cao,et al.  Deep People Counting in Extremely Dense Crowds , 2015, ACM Multimedia.

[101]  Anil M. Cheriyadat,et al.  Detecting Dominant Motions in Dense Crowds , 2008, IEEE Journal of Selected Topics in Signal Processing.

[102]  Deyu Meng,et al.  DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[103]  Clark McPhail,et al.  Individual and Collective Behaviors Within Gatherings, Demonstrations, and Riots , 1983 .

[104]  R. C. Fraley,et al.  Attachment and Loss , 2018 .

[105]  Yoshimitsu Aoki,et al.  Conversational Group Detection Based on Social Context Using Graph Clustering Algorithm , 2016, 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS).

[106]  Qi Zhu,et al.  Abnormal crowd behavior detection by using the particle entropy , 2014 .

[107]  Ioannis A. Kakadiaris,et al.  Social Cues in Group Formation and Local Interactions for Collective Activity Analysis , 2013, VISAPP.

[108]  Keith W. Ross,et al.  Estimating heights from photo collections: a data-driven approach , 2014, COSN '14.

[109]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[110]  Yanxi Liu,et al.  Automatic Surveillance Camera Calibration without Pedestrian Tracking , 2011, BMVC.

[111]  Jianxin Wu,et al.  Finding Coherent Motions and Semantic Regions in Crowd Scenes: A Diffusion and Clustering Approach , 2014, ECCV.