Image Annotation based on Deep Hierarchical Context Networks

Context modeling is one of the most fertile subfields of visual recognition which aims at designing discriminant image representations while incorporating their intrinsic and extrinsic relationships. However, the potential of context modeling is currently underexplored and most of the existing solutions are either context-free or restricted to simple handcrafted geometric relationships. We introduce in this paper DHCN: a novel Deep Hierarchical Context Network that leverages different sources of contexts including geometric and semantic relationships. The proposed method is based on the minimization of an objective function mixing a fidelity term, a context criterion and a regularizer. The solution of this objective function defines the architecture of a bi-level hierarchical context network; the first level of this network captures scene geometry while the second one corresponds to semantic relationships. We solve this representation learning problem by training its underlying deep network whose parameters correspond to the most influencing bi-level contextual relationships and we evaluate its performances on image annotation using the challenging ImageCLEF benchmark.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Hichem Sahbi,et al.  Bags-of-daglets for action recognition , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Hichem Sahbi,et al.  Kernel methods and scale invariance using the triangular kernel , 2004 .

[5]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Hichem Sahbi,et al.  Validity of Fuzzy Clustering Using Entropy Regularization , 2005, The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05..

[7]  Zhiwu Lu,et al.  Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation , 2017, IEEE Transactions on Image Processing.

[8]  Hichem Sahbi,et al.  Deep Context Networks for Image Annotation , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[9]  Hichem Sahbi,et al.  From coarse to fine skin and face detection , 2000, ACM Multimedia.

[10]  C. V. Jawahar,et al.  Exploring SVM for Image Annotation in Presence of Confusing Labels , 2013, BMVC.

[11]  Hichem Sahbi,et al.  Nonlinear Deep Kernel Learning for Image Annotation , 2017, IEEE Transactions on Image Processing.

[12]  Quentin Oliveau,et al.  Learning Attribute Representations for Remote Sensing Ship Category Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[13]  Pierre Vandergheynst,et al.  Graph Signal Processing: Overview, Challenges, and Applications , 2017, Proceedings of the IEEE.

[14]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[15]  Francesca Odone,et al.  Histogram intersection kernel for image classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[16]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[17]  Hichem Sahbi,et al.  Object recognition and retrieval by context dependent similarity kernels , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[18]  Hichem Sahbi,et al.  Nonlinear Cross-View Sample Enrichment for Action Recognition , 2014, ECCV Workshops.

[19]  Min Hu,et al.  Large scale automatic image annotation based on convolutional neural network , 2017, J. Vis. Commun. Image Represent..

[20]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[21]  Masafumi Hagiwara,et al.  An image retrieval system by impression words and specific object names - IRIS , 2002, Neurocomputing.

[22]  Hichem Sahbi,et al.  Fuzzy Clustering: Consistency of Entropy Regularization , 2004, Fuzzy Days.

[23]  Hichem Sahbi,et al.  Context-Based Support Vector Machines for Interconnected Image Annotation , 2010, ACCV.

[24]  C. V. Jawahar,et al.  Image Annotation Using Metric Learning in Semantic Neighbourhoods , 2012, ECCV.

[25]  Qian Zhang,et al.  A survey and analysis on automatic image annotation , 2018, Pattern Recognit..

[26]  B. Thomee,et al.  Overview of the ImageCLEF 2013 Scalable Concept Image Annotation Subtask , 2013, CLEF.

[27]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[29]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[30]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[32]  Hichem Sahbi,et al.  Camera pose estimation using Visual Servoing for aerial video change detection , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[33]  Hichem Sahbi,et al.  Deep Temporal Pyramid Design for Action Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[35]  Hervé Glotin,et al.  A Comparative Study of Diversity Methods for Hybrid Text and Image Retrieval Approaches , 2008, CLEF.

[36]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[37]  Subhransu Maji,et al.  Automatic Image Annotation using Deep Learning Representations , 2015, ICMR.

[38]  Xiaojun Qi,et al.  Incorporating multiple SVMs for automatic image annotation , 2007, Pattern Recognit..

[39]  Hichem Sahbi,et al.  Robust matching by dynamic space warping for accurate face recognition , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[40]  V. K. Govindan,et al.  A context-aware semantic modeling framework for efficient image retrieval , 2017, Int. J. Mach. Learn. Cybern..

[41]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[42]  Hichem Sahbi,et al.  High Order Stochastic Graphlet Embedding for Graph-Based Pattern Recognition , 2017, ArXiv.

[43]  Yinda Zhang,et al.  DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[45]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[46]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[48]  Cong Jin,et al.  A hybrid automatic image annotation approach , 2018, Multimedia Tools and Applications.

[49]  Hichem Sahbi,et al.  Coarse-to-Fine Deep Kernel Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[50]  Hichem Sahbi,et al.  Deep representation design from deep kernel networks , 2019, Pattern Recognit..

[51]  Jing Zhang,et al.  Image region annotation based on segmentation and semantic correlation analysis , 2018, IET Image Process..

[52]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[53]  F. Fleuret,et al.  Scale-Invariance of Support Vector Machines based on the Triangular Kernel , 2001 .

[54]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[55]  Feiping Nie,et al.  SVM based multi-label learning with missing labels for image annotation , 2018, Pattern Recognit..

[56]  Pedro Martins,et al.  Context-aware features and robust image representations , 2014, J. Vis. Commun. Image Represent..

[57]  Christian Wolf,et al.  Human body part estimation from depth images via spatially-constrained deep learning , 2014, Pattern Recognition Letters.

[58]  Hichem Sahbi Explicit Context-Aware Kernel Map Learning for Image Annotation , 2013, ICVS.

[59]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[60]  Yu-Chiang Frank Wang,et al.  Multi-label Zero-Shot Learning with Structured Knowledge Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[62]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[63]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[64]  Kaiqi Huang,et al.  Learning Deep Context-Aware Features over Body and Latent Parts for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Hichem Sahbi,et al.  Context-Dependent Kernels for Object Classification , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  Ivor W. Tsang,et al.  Two-Layer Multiple Kernel Learning , 2011, AISTATS.

[67]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[68]  Hichem Sahbi,et al.  Designing relevant features for visual speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[69]  Hichem Sahbi,et al.  Semi supervised deep kernel design for image annotation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[70]  Md. Monirul Islam,et al.  A review on automatic image annotation techniques , 2012, Pattern Recognit..

[71]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[72]  Qing Xie,et al.  CNN-feature based automatic image annotation method , 2018, Multimedia Tools and Applications.

[73]  Jing Zhang,et al.  Web image annotation based on Tri-relational Graph and semantic context analysis , 2019, Eng. Appl. Artif. Intell..

[74]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[76]  Edward Y. Chang,et al.  Using one-class and two-class SVMs for multiclass image annotation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[77]  Hichem Sahbi,et al.  Deep kernel map networks for image annotation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[78]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[79]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[80]  Hichem Sahbi,et al.  Spatio-temporal interaction for aerial video change detection , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[81]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[82]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[83]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[84]  Clement H. C. Leung,et al.  Automatic Semantic Annotation of Real-World Web Images , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[85]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[86]  Hichem Sahbi,et al.  Superpixel-based object class segmentation using conditional random fields , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[87]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[88]  Hichem Sahbi,et al.  Multi-view object matching and tracking using canonical correlation analysis , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[89]  A. Marino,et al.  2010 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM , 2010, IGARSS 2010.

[90]  Prakash Choudhary,et al.  Image annotation: Then and now , 2018, Image Vis. Comput..

[91]  Hichem Sahbi ImageCLEF annotation with explicit context-aware kernel maps , 2015, International Journal of Multimedia Information Retrieval.

[92]  R. Manmatha,et al.  An Inference Network Approach to Image Retrieval , 2004, CIVR.

[93]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[94]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[95]  Hichem Sahbi,et al.  Laplacian deep kernel learning for image annotation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[96]  Hichem Sahbi,et al.  Transductive Kernel Map Learning and Its Application Image Annotation , 2012, BMVC.

[97]  Ming-Hsuan Yang,et al.  Scene Parsing with Global Context Embedding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[98]  Weifeng Zhang,et al.  Neural ranking for automatic image annotation , 2018, Multimedia Tools and Applications.

[99]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[100]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[101]  Hichem Sahbi,et al.  A particular Gaussian mixture model for clustering and its application to image retrieval , 2008, Soft Comput..