Mining Compositional Features From GPS and Visual Cues for Event Recognition in Photo Collections

As digital cameras with Global Positioning System (GPS) capability become available and people geotag their photos using other means, it is of great interest to annotate semantic events (e.g., hiking, skiing, party) characterized by a collection of geotagged photos with timestamps and GPS information at the capture. We address this emerging event classification problem by mining informative features derived from image contents and spatio-temporal traces of GPS coordinates that characterize the underlying movement patterns of various event types, both based on the entire collection as opposed to individual photos. Considering that events are better described by the co-occurrence of objects and scenes, we bundle primitive features such as color and texture histograms or GPS features to form the discriminative compositional feature. A data mining method is proposed to efficiently discover discriminative compositional features of small classification errors. A theoretical analysis is also presented to guide the selection of the data mining parameters. Upon compositional feature mining, we perform the multiclass AdaBoost to further integrate the mined compositional features. Finally, the GPS and visual modalities are united through a confidence-based fusion. Based on a dataset of more than 3000 geotagged images, experimental results have shown the synergy of all of the components in our proposed approach to event classification.

[1]  Jinyan Li,et al.  Mining statistically important equivalence classes and delta-discriminative emerging patterns , 2007, KDD '07.

[2]  Ming Yang,et al.  From frequent itemsets to semantically meaningful visual patterns , 2007, KDD '07.

[3]  Jiebo Luo,et al.  Pictures are not taken in a vacuum - an overview of exploiting context for semantic scene content understanding , 2006, IEEE Signal Processing Magazine.

[4]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[5]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[6]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[7]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[8]  Philip S. Yu,et al.  Direct mining of discriminative and essential frequent patterns via model-based search tree , 2008, KDD.

[9]  Shih-Fu Chang,et al.  Short-term audio-visual atoms for generic video concept classification , 2009, ACM Multimedia.

[10]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[11]  Jiebo Luo,et al.  Mining GPS traces and visual words for event classification , 2008, MIR '08.

[12]  Gang Hua,et al.  Integrated feature selection and higher-order spatial feature extraction for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[15]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[16]  Li Fei-Fei,et al.  Neural mechanisms of rapid natural scene categorization in human visual cortex , 2009, Nature.

[17]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[18]  Jiebo Luo,et al.  Event recognition: viewing the world with a third eye , 2008, ACM Multimedia.

[19]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Rong Yan,et al.  Model-shared subspace boosting for multi-label classification , 2007, KDD '07.

[22]  Jiebo Luo,et al.  Mining compositional features for boosting , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jiebo Luo,et al.  Automatic image orientation detection via confidence-based integration of low-level and semantic cues , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Joo-Hwee Lim,et al.  Home Photo Content Modeling for Personalized Event-Based Retrieval , 2003, IEEE Multim..

[26]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[27]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[28]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[29]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[30]  Andrew Zisserman,et al.  Video data mining using configurations of viewpoint invariant regions , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[31]  B. S. Manjunath,et al.  Automatic video annotation through search and mining , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[32]  Shahram Ebadollahi,et al.  Visual Event Detection using Multi-Dimensional Concept Dynamics , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[33]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Nuno Vasconcelos,et al.  Holistic context modeling using semantic co-occurrences , 2009, CVPR.

[35]  Dong Xu,et al.  Visual Event Recognition in News Video using Kernel Methods with Multi-Level Temporal Alignment , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[37]  Jiebo Luo,et al.  Large-scale multimodal semantic concept detection for consumer video , 2007, MIR '07.

[38]  Nicu Sebe,et al.  Context-Based Object-Class Recognition and Retrieval by Generalized Correlograms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Jiebo Luo,et al.  Image Annotation Within the Context of Personal Photo Collections Using Hierarchical Event and Scene Models , 2009, IEEE Transactions on Multimedia.