Large-Scale Aerial Image Categorization Using a Multitask Topological Codebook

Fast and accurately categorizing the millions of aerial images on Google Maps is a useful technique in pattern recognition. Existing methods cannot handle this task successfully due to two reasons: 1) the aerial images' topologies are the key feature to distinguish their categories, but they cannot be effectively encoded by a conventional visual codebook and 2) it is challenging to build a realtime image categorization system, as some geo-aware Apps update over 20 aerial images per second. To solve these problems, we propose an efficient aerial image categorization algorithm. It focuses on learning a discriminative topological codebook of aerial images under a multitask learning framework. The pipeline can be summarized as follows. We first construct a region adjacency graph (RAG) that describes the topology of each aerial image. Naturally, aerial image categorization can be formulated as RAG-to-RAG matching. According to graph theory, RAG-to-RAG matching is conducted by enumeratively comparing all their respective graphlets (i.e., small subgraphs). To alleviate the high time consumption, we propose to learn a codebook containing topologies jointly discriminative to multiple categories. The learned topological codebook guides the extraction of the discriminative graphlets. Finally, these graphlets are integrated into an AdaBoost model for predicting aerial image categories. Experimental results show that our approach is competitive to several existing recognition models. Furthermore, over 24 aerial images are processed per second, demonstrating that our approach is ready for real-world applications.

[1]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[2]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[3]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[4]  Jake Porway,et al.  A hierarchical and contextual model for aerial image understanding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Luis Gómez-Chova,et al.  Multitask Remote Sensing Data Classification , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[7]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[8]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[9]  Nicu Sebe,et al.  Discriminating Joint Feature Analysis for Multimedia Data Understanding , 2012, IEEE Transactions on Multimedia.

[10]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[11]  Xiao Liu,et al.  Spatial graphlet matching kernel for recognizing aerial image categories , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[14]  Jake Porway,et al.  A stochastic graph grammar for compositional object representation and recognition , 2009, Pattern Recognit..

[15]  Kun Zhou,et al.  Locality Sensitive Discriminant Analysis , 2007, IJCAI.

[16]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[17]  Yong Jae Lee,et al.  Object-graphs for context-aware category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Feiping Nie,et al.  Embedding new data points for manifold learning via coordinate propagation , 2007, Knowledge and Information Systems.

[19]  Yongtian Wang,et al.  Object categorization with sketch representation and generalized samples , 2012, Pattern Recognit..

[20]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[21]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[22]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[23]  Xuelong Li,et al.  Spectral Embedded Hashing for Scalable Image Retrieval , 2014, IEEE Transactions on Cybernetics.

[24]  Sven J. Dickinson,et al.  Generic model abstraction from examples , 2000, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[26]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[27]  Xiao Liu,et al.  Probabilistic Graphlet Transfer for Photo Cropping , 2013, IEEE Transactions on Image Processing.

[28]  Feiping Nie,et al.  Efficient semi-supervised feature selection with noise insensitive trace ratio criterion , 2013, Neurocomputing.

[29]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[30]  Thomas Lengauer,et al.  Multi-task learning for HIV therapy screening , 2008, ICML '08.

[31]  Ling Shao,et al.  Learning Discriminative Key Poses for Action Recognition , 2013, IEEE Transactions on Cybernetics.

[32]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[33]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[34]  Hongbin Zha,et al.  Supervised Kernel Descriptors for Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Zaïd Harchaoui,et al.  Image Classification with Segmentation Graph Kernels , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[37]  M. Fatih Demirci,et al.  Object Recognition as Many-to-Many Feature Matching , 2006, International Journal of Computer Vision.

[38]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[39]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[41]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[42]  Xiao Liu,et al.  Semi-supervised Node Splitting for Random Forest Construction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Zhiguo Jiang,et al.  A Hierarchical Connection Graph Algorithm for Gable-Roof Detection in Aerial Image , 2011, IEEE Geoscience and Remote Sensing Letters.

[44]  Yi Yang,et al.  Discovering Discriminative Graphlets for Aerial Image Categories Recognition , 2013, IEEE Transactions on Image Processing.

[45]  Jean Ponce,et al.  A graph-matching kernel for object categorization , 2011, 2011 International Conference on Computer Vision.

[46]  Yue Gao,et al.  Feature Correlation Hypergraph: Exploiting High-order Potentials for Multimodal Recognition , 2014, IEEE Transactions on Cybernetics.

[47]  Ling Shao,et al.  Spatio-Temporal Laplacian Pyramid Coding for Action Recognition , 2014, IEEE Transactions on Cybernetics.

[48]  Xuelong Li,et al.  Saliency Detection by Multiple-Instance Learning , 2013, IEEE Transactions on Cybernetics.

[49]  Nicu Sebe,et al.  Knowledge Adaptation with PartiallyShared Features for Event DetectionUsing Few Exemplars , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[52]  Jingdong Wang,et al.  Online Robust Non-negative Dictionary Learning for Visual Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[53]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[54]  Frank Harary,et al.  Graph Theory , 2016 .

[55]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[56]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[58]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..