Detecting Social Groups in Crowded Surveillance Videos Using Visual Attention

In this paper we demonstrate that the current state of the art social grouping methodology can be enhanced with the use of visual attention estimation. In a surveillance environment it is possible to extract the gazing direction of pedestrians, a feature which can be used to improve social grouping estimation. We implement a state of the art motion based social grouping technique to get a baseline success at social grouping, and implement the same grouping with the addition of the visual attention feature. By a comparison of the success at finding social groups for two techniques we evaluate the effectiveness of including the visual attention feature. We test both methods on two datasets containing busy surveillance scenes. We find that the inclusion of visual interest improves the motion social grouping capability. For the Oxford data, we see a 5.6% improvement in true positives and 28.5% reduction in false positives. We see up to a 50% reduction in false positives in other datasets. The strength of the visual feature is demonstrated by the association of social connections that are otherwise missed by the motion only social grouping technique.

[1]  Ian D. Reid,et al.  Automatic Reasoning about Causal Events in Surveillance Video , 2011, EURASIP J. Image Video Process..

[2]  Ting Yu,et al.  Monitoring, recognizing and discovering social networks , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Vittorio Murino,et al.  Social interactions by visual focus of attention in a three‐dimensional environment , 2013, Expert Syst. J. Knowl. Eng..

[4]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Ting Yu,et al.  Monitoring, recognizing and discovering social networks , 2009, CVPR.

[6]  Michael J. V. Leach,et al.  Contextual anomaly detection in crowded surveillance scenes , 2014, Pattern Recognit. Lett..

[7]  Robert T. Collins,et al.  Automatically detecting the small group structure of a crowd , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[8]  Jean-Marc Odobez,et al.  We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Tao Xiang,et al.  Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ian D. Reid,et al.  Stable multi-target tracking in real-time surveillance video , 2011, CVPR 2011.