Bikers Are Like Tobacco Shops, Formal Dressers Are Like Suits: Recognizing Urban Tribes with Caffe

Recognition of social styles of people is an interesting but relatively unexplored task. Recognizing "style" appears to be a quite different problem than categorization, it is like recognizing a letter's font as opposed to recognizing the letter itself. Similar-looking things must be mapped to different categories. Hence a priori it would appear that features that are good for categorization should not be good for style recognition. Here we show this is not the case by starting with a convolutional deep network pre-trained on Image Net (Caffe), a categorization problem, and using the features as input to a classifier for urban tribes. Combining the results from individuals in group pictures and the group itself, with some fine-tuning of the network, we reduce the previous state of the art error by almost half, going from 46% recognition rate to 71%. To explore how the networks perform this task, we compute the mutual information between the Image Net output category activations and the urban tribe categories, and find, for example, that bikers are well categorized as whiptail lizards by Caffe, and that better recognized social groups have more highly-correlated Image Net categories. This gives us insight into the features useful for categorizing urban tribes.

[1]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[2]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[3]  Gang Wang,et al.  Seeing People in Social Context: Recognizing People and Social Relationships , 2010, ECCV.

[4]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[5]  David J. Kriegman,et al.  From Bikers to Surfers: Visual Recognition of Urban Tribes , 2013, BMVC.

[6]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[7]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[10]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[11]  M. Maffesoli The Time of the Tribes: The Decline of Individualism in Mass Society , 1995 .

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[14]  Andrew C. Gallagher,et al.  Understanding images of groups of people , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  David J. Kriegman,et al.  Urban tribes: Analyzing group photos from a social perspective , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[20]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Roland Göcke,et al.  Finding Happiest Moments in a Social Context , 2012, ACCV.

[22]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.