Can a machine have two systems for recognition, like human beings?

Abstract Artificial Intelligence has attracted much of researchers’ attention in recent years. A question we always ask is: “Can machines replace human beings to some extent?” This paper aims to explore the knowledge learning for an image-annotation framework, which is an easy task for humans but a tough task for machines. This paper’s research is based on an assumption that machines have two systems of thinking, each of which handles the labels of images at different abstract levels. Based on this, a new hierarchical model for image annotation is introduced. We explore not only the relationships between the labels and the features used, but also the relationships between labels. More specifically, we divide labels into several hierarchies for efficient and accurate labeling, which are constructed using our Associative Memory Sharing method, proposed in this paper.

[1]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2]  Feiping Nie,et al.  Semi-supervised learning on large-scale geotagged photos for situation recognition , 2017, J. Vis. Commun. Image Represent..

[3]  Gabriela Csurka,et al.  Tree-Structured CRF Models for Interactive Image Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[5]  Edoardo Pasolli,et al.  An Active Learning Framework for Hyperspectral Image Classification Using Hierarchical Segmentation , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[6]  C. V. Jawahar,et al.  Self-Supervised Learning of Visual Features through Embedding Images into Text Topic Spaces , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[8]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ramesh C. Jain,et al.  Using Photos as Micro-Reports of Events , 2016, ICMR.

[11]  Yangqing Jia,et al.  Deep Convolutional Ranking for Multilabel Image Annotation , 2013, ICLR.

[12]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.

[15]  Chien-Li Chou,et al.  Effective Semantic Annotation by Image-to-Concept Distribution Model , 2011, IEEE Transactions on Multimedia.

[16]  Jian Yu,et al.  Object recognition via contextual color attention , 2015, J. Vis. Commun. Image Represent..

[17]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Qi Tian,et al.  Multi-label boosting for image annotation by structural grouping sparsity , 2010, ACM Multimedia.

[19]  Kin-Man Lam,et al.  An efficient two-stage framework for image annotation , 2013, Pattern Recognit..

[20]  Greg Mori,et al.  Learning Structured Inference Neural Networks with Label Relations , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Qiang Li,et al.  Conditional Graphical Lasso for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[26]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[27]  Abbes Amira,et al.  Semantic content-based image retrieval: A comprehensive study , 2015, J. Vis. Commun. Image Represent..

[28]  Xiaogang Wang,et al.  Deeply learned attributes for crowded scene understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Xin Li,et al.  Multi-label Image Classification with A Probabilistic Label Enhancement Model , 2014, UAI.

[30]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Kin-Man Lam,et al.  Constructing a hierarchical tree for image annotation , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[32]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[33]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35]  Xiaogang Wang,et al.  Object Detection from Video Tubelets with Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Kin-Man Lam,et al.  A hierarchical algorithm for image multi-labeling , 2010, 2010 IEEE International Conference on Image Processing.

[37]  Yang Yu,et al.  Automatic image annotation using group sparsity , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Ning Zhou,et al.  A Hybrid Probabilistic Model for Unified Collaborative and Content-Based Image Tagging , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[40]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.