Urban tribes: Analyzing group photos from a social perspective

The explosive growth in image sharing via social networks has produced exciting opportunities for the computer vision community in areas including face, text, product and scene recognition. In this work we turn our attention to group photos of people and ask the question: what can we determine about the social subculture or urban tribe to which these people belong? To this end, we propose a framework employing low- and mid-level features to capture the visual attributes distinctive to a variety of urban tribes. We proceed in a semi-supervised manner, employing a metric that allows us to extrapolate from a small number of pairwise image similarities to induce a set of groups that visually correspond to familiar urban tribes such as biker, hipster or goth. Automatic recognition of such information in group photos offers the potential to improve recommendation services, context sensitive advertising and other social analysis applications. We present promising preliminary experimental results that demonstrate our ability to categorize group photos in a socially meaningful manner.

[1]  M. Maffesoli The Time of the Tribes: The Decline of Individualism in Mass Society , 1995 .

[2]  Meng Wang,et al.  Predicting occupation via human clothing and contexts , 2011, 2011 International Conference on Computer Vision.

[3]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Alper Yilmaz,et al.  Inferring social relations from visual concepts , 2011, 2011 International Conference on Computer Vision.

[5]  Andrew C. Gallagher,et al.  Understanding images of groups of people , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Dan Cosley,et al.  Inferring social ties from geographic coincidences , 2010, Proceedings of the National Academy of Sciences.

[7]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[9]  Subhransu Maji,et al.  Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[10]  Vladimir Vezhnevets,et al.  A Survey on Pixel-Based Skin Color Detection Techniques , 2003 .

[11]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[12]  Lior Wolf,et al.  Leveraging Billions of Faces to Overcome Performance Barriers in Unconstrained Face Recognition , 2011, ArXiv.

[13]  Yang Wang,et al.  Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Jiebo Luo,et al.  The wisdom of social multimedia: using flickr for prediction and forecast , 2010, ACM Multimedia.

[16]  TorralbaAntonio,et al.  Modeling the Shape of the Scene , 2001 .

[17]  Gang Wang,et al.  Seeing People in Social Context: Recognizing People and Social Relationships , 2010, ECCV.