Learning human photo shooting patterns from large-scale community photo collections

Social photo sharing platforms on the Internet (e.g. Flickr) host billions of publicly accessible photos captured by millions of individual users from all over the world. These user-contributed and geo-tagged photo collections provide insights into human sociocultural life and provide important clues for understanding people’s engagement and reaction to places and events around the world today. In this paper, we analyze over 2 million geo-tagged images uploaded by 12,000 individual Flickr users to investigate the photograph shooting patterns of different user groups; that is, tourist and local, Asian and European, and male and female users. Specifically, we make use of visual features extracted on single monocular images and their spatial configurations to infer 3D depth information of the photographs to establish the preferred shooting scale (close-up or far-distant) of the user groups. The results reveal which objects and scenes interest different groups of people and how these preferences change over space and time. As such, the research offers a new approach to the human sciences which study the individual, groups and society.

[1]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[2]  Keiji Yanai,et al.  Mining cultural differences from a large number of geotagged photos , 2009, WWW '09.

[3]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[4]  Honglak Lee,et al.  Automatic Single-Image 3d Reconstructions of Indoor Manhattan World Scenes , 2007, ISRR.

[5]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Mingquan Zhou,et al.  A Weighted Color MRF Model for 3D Reconstruction from a Single Image , 2013, 2013 International Conference on Virtual Reality and Visualization.

[7]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[8]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[9]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[10]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[11]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[12]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[13]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[14]  Agusti Solanas,et al.  3D simultaneous localization and modeling from stereo vision , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[15]  Tat-Seng Chua,et al.  Research and applications on georeferenced multimedia: a survey , 2010, Multimedia Tools and Applications.

[16]  Wei Zhang,et al.  Extraction, matching and pose recovery based on dominant rectangular structures , 1989, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[17]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[18]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[19]  Sarah Steiner Gender, Genre, and Writing Style in Formal Written Texts , 2014 .

[20]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[21]  Mor Naaman,et al.  Generating diverse and representative image search results for landmarks , 2008, WWW.

[22]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[23]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Daniel Gatica-Perez,et al.  Modeling Flickr Communities Through Probabilistic Topic-Based Analysis , 2010, IEEE Transactions on Multimedia.

[27]  Philip Treleaven,et al.  Quantifying the Digital Traces of Hurricane Sandy on Flickr , 2013, Scientific Reports.

[28]  Shawn D. Newsam,et al.  Proximate sensing: Inferring what-is-where from georeferenced photo collections , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, SIGGRAPH 2005.

[30]  Antonio Torralba,et al.  Depth Estimation from Image Structure , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[32]  Yanpeng Cao,et al.  Viewpoint invariant features from single images using 3D geometry , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[33]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[34]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  KeeChang Lee,et al.  Fast Automatic Single-View 3-d Reconstruction of Urban Scenes , 2008, ECCV.

[36]  Fei-Fei Li,et al.  Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[37]  Yanpeng Cao,et al.  Improved feature extraction and matching in urban environments based on 3D viewpoint normalization , 2012, Comput. Vis. Image Underst..

[38]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[39]  J. Nadal,et al.  Manifesto of computational social science , 2012 .

[40]  Marcel Worring,et al.  Learning Visual Contexts for Image Annotation From Flickr Groups , 2011, IEEE Transactions on Multimedia.

[41]  Reinhard Koch,et al.  Visual Modeling with a Hand-Held Camera , 2004, International Journal of Computer Vision.

[42]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[43]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[44]  A. Pentland,et al.  Life in the network: The coming age of computational social science: Science , 2009 .

[45]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[46]  Yang Yu,et al.  Automatic image annotation using group sparsity , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Takahiro Hara,et al.  Mining people's trips from large scale geo-tagged photos , 2010, ACM Multimedia.

[48]  Bao-Liang Lu,et al.  Feature Selection for Fast Image Classification with Support Vector Machines , 2004, ICONIP.