Understanding images of groups of people

In many social settings, images of groups of people are captured. The structure of this group provides meaningful context for reasoning about individuals in the group, and about the structure of the scene as a whole. For example, men are more likely to stand on the edge of an image than women. Instead of treating each face independently from all others, we introduce contextual features that encapsulate the group structure locally (for each person in the group) and globally (the overall structure of the group). This “social context” allows us to accomplish a variety of tasks, such as such as demographic recognition, calculating scene and camera parameters, and even event recognition. We perform human studies to show this context aids recognition of demographic information in images of strangers.

[1]  Lamar Veatch,et al.  Toward the Environmental Design of Library Buildings. , 1987 .

[2]  Jiebo Luo,et al.  Image Annotation Within the Context of Personal Photo Collections Using Hierarchical Event and Scene Models , 2009, IEEE Transactions on Multimedia.

[3]  L. Farkas,et al.  Anthropometric Facial Proportions in Medicine , 1986 .

[4]  Jiebo Luo,et al.  Annotating collections of photos using hierarchical event and scene models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[6]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[7]  Ramakant Nevatia,et al.  Camera calibration from video of a walking human , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Tsuhan Chen,et al.  Using Group Prior to Identify People in Consumer Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Mor Naaman,et al.  Leveraging context to resolve identity in photo albums , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[10]  Wendy Ju,et al.  Range: exploring proxemics in collaborative whiteboard interaction , 2007, CHI Extended Abstracts.

[11]  Shumeet Baluja,et al.  Boosting Sex Identification Performance , 2005, International Journal of Computer Vision.

[12]  J. Freedman,et al.  Conceptions of Crowding. (Book Reviews: Crowding and Behavior; The Environment and Social Behavior. Privacy, Personal Space. Territory, Crowding) , 1975 .

[13]  Edward T. Hall,et al.  A System for the Notation of Proxemic Behavior1 , 1963 .

[14]  Yun Fu,et al.  Image-Based Human Age Estimation by Manifold Learning and Locally Adjusted Robust Regression , 2008, IEEE Transactions on Image Processing.

[15]  Richard Szeliski,et al.  Finding People in Repeated Shots of the Same Scene , 2006, BMVC.

[16]  Jiebo Luo,et al.  Probabilistic spatial context models for scene content understanding , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  C. Christodoulou,et al.  Comparing different classifiers for automatic age estimation , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Yee Whye Teh,et al.  Names and faces in the news , 2004, CVPR 2004.

[19]  Tsuhan Chen,et al.  From appearance to context-based recognition: Dense labeling in small images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Antonio Torralba,et al.  Statistical context priming for object detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[21]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Trevor Darrell,et al.  Autotagging Facebook: Social network context improves photo annotation , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[24]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[25]  Yu Zhang,et al.  Learning from facial aging patterns for automatic age estimation , 2006, MM '06.

[26]  Longbin Chen,et al.  Content-Based Photo Album Management Using Faces , 2008, Encyclopedia of Multimedia.

[27]  Mohamed Abdel-Mottaleb,et al.  Content-based photo album management using faces' arrangement , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).