Optimized CNN Based Image Recognition Through Target Region Selection

Abstract Image recognition has plateaued in the last few years. According to this research field, some complicated models typically combined feature extraction and classification models effectively. Moreover, many classic models have already achieved realistic recognition. However, there are still some drawbacks of traditional methods. On the one hand, some unrelated regions of learning instances are often used leading to ignorance of effective features. On the other hand, traditional CNN model don’t consider the weights of learning instances which reduces the accuracy of image recognition. Aiming at the problems above, we proposed one optimized CNN based image recognition model. Firstly, target region selected by bottom-up region proposals contributes to retrieve the target region of each learning instance. Secondly, enhancement weight based model is used to optimize the CNN model contributing to make full use of different learning instances. At last, adequate experiments show our method’s superiority, especially compared to some other traditional methods.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Hao Wu,et al.  Creative and high-quality image composition based on a new criterion , 2016, J. Vis. Commun. Image Represent..

[4]  Nikos Koutsias,et al.  SVM-Based Fuzzy Decision Trees for Classification of High Spatial Resolution Remote Sensing Images , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Hao Wu,et al.  Optimized recognition with few instances based on semantic distance , 2014, The Visual Computer.

[6]  Qi Tian,et al.  Toward a higher-level visual representation for object-based image retrieval , 2008, The Visual Computer.

[7]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Hao Wu,et al.  Image completion with multi-image based on entropy reduction , 2015, Neurocomputing.

[10]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[11]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[12]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[13]  Roland Memisevic,et al.  Zero-bias autoencoders and the benefits of co-adapting features , 2014, ICLR.

[14]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[16]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[18]  Giles M. Foody,et al.  Feature Selection for Classification of Hyperspectral Data by SVM , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[19]  Michael J. Swain,et al.  Indexing via color histograms , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[20]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[21]  Subhransu Maji,et al.  Max-margin additive classifiers for detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[26]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[27]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[28]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[30]  Kai Kunze,et al.  The Wordometer -- Estimating the Number of Words Read Using Document Image Retrieval and Mobile Eye Tracking , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[31]  Adrian Ulges,et al.  Identifying relevant frames in weakly labeled videos for training concept detectors , 2008, CIVR '08.

[32]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[33]  Shimon Ullman,et al.  Cross-generalization: learning novel classes from a single example by feature replacement , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Hao Wu,et al.  Optimized learning instance-based image retrieval , 2017, Multimedia Tools and Applications.

[36]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[37]  Hao Wu,et al.  A new sampling algorithm for high-quality image matting , 2016, J. Vis. Commun. Image Represent..

[38]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[40]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[42]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[43]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Roberto Cipolla,et al.  Refining Architectures of Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).