论文信息 - Weakly supervised learning of deformable part models and convolutional neural networks for object detection

Weakly supervised learning of deformable part models and convolutional neural networks for object detection

In this dissertation we address the problem of weakly supervised object detection, wherein the goal is to recognize and localize objects in weakly-labeled images where object-level annotations are incomplete during training. To this end, we propose two methods which learn two different models for the objects of interest. In our first method, we propose a model enhancing the weakly supervised Deformable Part-based Models (DPMs) by emphasizing the importance of location and size of the initial class-specific root filter. We first compute a candidate pool that represents the potential locations of the object as this root filter estimate, by exploring the generic objectness measurement (region proposals) to combine the most salient regions and “good” region proposals. We then propose learning of the latent class label of each candidate window as a binary classification problem, by training category-specific classifiers used to coarsely classify a candidate window into either a target object or a non-target class. Furthermore, we improve detection by incorporating the contextual information from image classification scores. Finally, we design a flexible enlarging-and-shrinking post-processing procedure to modify the DPMs outputs, which can effectively match the approximate object aspect ratios and further improve final accuracy. Second, we investigate how knowledge about object similarities from both visual and semantic domains can be transferred to adapt an image classifier to an object detector in a semi-supervised setting on a large-scale database, where a subset of object categories are annotated with bounding boxes. We propose to transform deep Convolutional Neural Networks (CNN)-based image-level classifiers into object detectors by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We have evaluated both our approaches extensively on several challenging detection benchmarks, e.g. , PASCAL VOC, ImageNet ILSVRC and Microsoft COCO. Both our approaches compare favorably to the state-of-the-art and show significant improvement over several other recent weakly supervised detection methods.

Yuxing Tang | Yuxing Tang

[1] Cristian Sminchisescu,et al. Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[8] Bernt Schiele,et al. What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Jiaolong Xu,et al. Domain Adaptation of Deformable Part-Based Models , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[11] Jitendra Malik,et al. Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[12] Ieee Xplore,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Michael J. Swain,et al. Color indexing , 1991, International Journal of Computer Vision.

[14] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15] Dumitru Erhan,et al. Deep Neural Networks for Object Detection , 2013, NIPS.

[16] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[17] Dima Damen,et al. Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Yao Li,et al. Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution , 2016, ECCV.

[19] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[20] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[21] David A. McAllester,et al. Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22] Hinrich Schütze,et al. AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[23] Cordelia Schmid,et al. An Affine Invariant Interest Point Detector , 2002, ECCV.

[24] Cordelia Schmid,et al. Learning to Parse Pictures of People , 2002, ECCV.

[25] Cordelia Schmid,et al. Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[26] Bolei Zhou,et al. Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[27] John G. Daugman,et al. Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression , 1988, IEEE Trans. Acoust. Speech Signal Process..

[28] Tao Xiang,et al. In Defence of Negative Mining for Annotating Weakly Labelled Data , 2012, ECCV.

[29] Guillermo Sapiro,et al. Supervised Dictionary Learning , 2008, NIPS.

[30] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Yong Jae Lee,et al. Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Qiang Chen,et al. Network In Network , 2013, ICLR.

[33] Cristian Sminchisescu,et al. CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[35] Thomas S. Huang,et al. Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[36] Trevor Darrell,et al. Semi-supervised Domain Adaptation with Instance Constraints , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37] François Fleuret,et al. Exact Acceleration of Linear Object Detectors , 2012, ECCV.

[38] Jonathan T. Barron,et al. Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[40] Derek Hoiem,et al. Diagnosing Error in Object Detectors , 2012, ECCV.

[41] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[42] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[43] Martial Hebert,et al. Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[44] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[45] Brendan J. Frey,et al. k-Sparse Autoencoders , 2013, ICLR.

[46] Andrea Vedaldi,et al. Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Philip H. S. Torr,et al. BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[48] Thomas Mensink,et al. Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[49] Gian Luca Foresti,et al. Automatic detection and indexing of video-event shots for surveillance applications , 2002, IEEE Trans. Multim..

[50] Deva Ramanan,et al. Histograms of Sparse Codes for Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[51] Ali Borji,et al. Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[52] Cordelia Schmid,et al. Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[53] Thomas Deselaers,et al. Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Ivan Laptev,et al. Object Detection Using Strongly-Supervised Deformable Part Models , 2012, ECCV.

[55] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[56] Shimon Ullman,et al. Object recognition with informative features and linear classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[57] Robert E. Schapire,et al. The Boosting Approach to Machine Learning An Overview , 2003 .

[58] King Ngi Ngan,et al. Co-Salient Object Detection From Multiple Images , 2013, IEEE Transactions on Multimedia.

[59] Yi Yang,et al. Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60] Tommi S. Jaakkola,et al. Maximum-Margin Matrix Factorization , 2004, NIPS.

[61] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[62] Svetlana Lazebnik,et al. Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[63] Jordi Gonzàlez,et al. A coarse-to-fine approach for fast deformable object detection , 2011, CVPR 2011.

[64] Ivan Laptev,et al. Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65] Cordelia Schmid,et al. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66] R. Lathe. Phd by thesis , 1988, Nature.

[67] Thomas Deselaers,et al. Visual and semantic similarity in ImageNet , 2011, CVPR 2011.

[68] Tinne Tuytelaars,et al. Dense interest points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[69] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[70] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[71] Emmanuel Dellandréa,et al. Music sparse decomposition onto a MIDI dictionary of musical words and its application to music mood classification , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[72] Iasonas Kokkinos,et al. Segmentation-Aware Deformable Part Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[73] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[74] N. Otsu. A threshold selection method from gray level histograms , 1979 .

[75] Rui Zhang,et al. Contextual Object Detection With Spatial Context Prototypes , 2014, IEEE Transactions on Multimedia.

[76] Pietro Perona,et al. Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[77] Tao Xiang,et al. Weakly supervised object detector learning with model drift detection , 2011, 2011 International Conference on Computer Vision.

[78] Felice Dell'Orletta,et al. Accurate Dependency Parsing with a Stacked Multilayer Perceptron , 2009 .

[79] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[80] 池内克史,et al. Computer Vision: A Reference Guide , 2014 .

[81] Jinhui Tang,et al. Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain Knowledge Propagation , 2015, ACM Multimedia.

[82] Pietro Perona,et al. Unsupervised Learning of Models for Recognition , 2000, ECCV.

[83] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84] John K. Tsotsos,et al. 50 Years of object recognition: Directions forward , 2013, Comput. Vis. Image Underst..

[85] Cordelia Schmid,et al. Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[86] Tao Xiang,et al. Transfer Learning by Ranking for Weakly Supervised Object Annotation , 2017, BMVC.

[87] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[88] Yuxing Tang,et al. Fusing generic objectness and deformable part-based models for weakly supervised object detection , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[89] Tony Lindeberg,et al. Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[90] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[91] Qiang Yang,et al. Heterogeneous Transfer Learning for Image Classification , 2011, AAAI.

[92] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[93] David A. McAllester,et al. Object Detection with Grammar Models , 2011, NIPS.

[94] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[95] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[96] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[97] Marc Sebban,et al. Supervised learning of Gaussian mixture models for visual vocabulary generation , 2012, Pattern Recognit..

[98] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[99] Yuxing Tang,et al. Weakly Supervised Learning of Deformable Part-Based Models for Object Detection via Region Proposals , 2017, IEEE Transactions on Multimedia.

[100] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[101] Jorge S. Marques,et al. Performance evaluation of object detection algorithms for video surveillance , 2006, IEEE Transactions on Multimedia.

[102] Chong Wang,et al. Large-Scale Weakly Supervised Object Localization via Latent Category Learning , 2015, IEEE Transactions on Image Processing.

[103] Wei Zhang,et al. An Adaptive Computational Model for Salient Object Detection , 2010, IEEE Transactions on Multimedia.

[104] Yong Jae Lee,et al. Weakly-supervised Discovery of Visual Pattern Configurations , 2014, NIPS.

[105] Bernt Schiele,et al. What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[106] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[107] Yan Ke,et al. The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[108] Markus A. Stricker,et al. Similarity of color images , 1995, Electronic Imaging.

[109] Andrew Zisserman,et al. The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[110] Thomas Deselaers,et al. Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.

[111] Matthew A. Brown,et al. Picking the best DAISY , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[112] Jitendra Malik,et al. Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[113] Ramin Zabih,et al. Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[114] Jian Sun,et al. Object Detection Networks on Convolutional Feature Maps , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[115] Trevor Darrell,et al. LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[116] Thomas Hofmann,et al. Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[117] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[118] Qiang Chen,et al. Contextualizing Object Detection and Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[119] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[120] Tinne Tuytelaars,et al. Weakly supervised object detection with convex clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[121] Martial Hebert,et al. Watch and learn: Semi-supervised learning of object detectors from videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[122] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[123] Carsten Rother,et al. Weakly supervised discriminative localization and classification: a joint learning process , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[124] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[125] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[126] Emmanuel Dellandréa,et al. IRIM at TRECVID 2015: Semantic Indexing , 2015, TRECVID.

[127] David G. Lowe,et al. Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[128] Deva Ramanan,et al. Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[129] Luc Van Gool,et al. Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[130] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[131] Matti Pietikäinen,et al. Rotation-Invariant Image and Video Description With Local Binary Pattern Features , 2012, IEEE Transactions on Image Processing.

[132] Yang Wang,et al. Weakly supervised localization of novel objects using appearance transfer , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[133] Yuxing Tang,et al. Large Scale Semi-Supervised Object Detection Using Visual and Semantic Knowledge Transfer , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[134] Matti Pietikäinen,et al. Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[135] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[136] Bernt Schiele,et al. How good are detection proposals, really? , 2014, BMVC.

[137] Nils J. Nilsson,et al. The Quest for Artificial Intelligence , 2009 .

[138] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[139] Rajat Raina,et al. Efficient sparse coding algorithms , 2006, NIPS.

[140] Frédéric Jurie,et al. Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[141] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[142] Hervé Glotin,et al. IRIM at TRECVID 2014: Semantic Indexing and Instance Search , 2014, TRECVID.

[143] Matti Pietikäinen,et al. Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[144] Tao Xiang,et al. Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation , 2013, 2013 IEEE International Conference on Computer Vision.

[145] Junjie Yan,et al. The Fastest Deformable Part Model for Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[146] Jitendra Malik,et al. Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[147] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[148] Jitendra Malik,et al. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[149] Iasonas Kokkinos,et al. Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound , 2011, NIPS.

[150] Liming Chen,et al. Line segment based edge feature using Hough transform , 2007 .

[151] Antonio Torralba,et al. Unsupervised Detection of Regions of Interest Using Iterative Link Analysis , 2009, NIPS.

[152] Jiri Matas,et al. Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[153] Chee Sun Won,et al. Efficient use of local edge histogram descriptor , 2000, MULTIMEDIA '00.

[154] Dekang Lin,et al. An Information-Theoretic Definition of Similarity , 1998, ICML.

[155] Cordelia Schmid,et al. A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[156] Iasonas Kokkinos,et al. Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[157] Hiroshi Murase,et al. Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[158] Liming Chen,et al. Image region description using orthogonal combination of local binary patterns enhanced with color information , 2013, Pattern Recognit..

[159] Mubarak Shah,et al. Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[160] Xiaogang Wang,et al. DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[161] T. Tuytelaars,et al. Weakly Supervised Object Detection with Posterior Regularization , 2014 .

[162] Yuxing Tang,et al. Fan-shaped patch local binary patterns for texture classification , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[163] Oded Maron,et al. Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[164] Zaïd Harchaoui,et al. On learning to localize objects with minimal supervision , 2014, ICML.

[165] Ling Shao,et al. Transfer Learning for Visual Categorization: A Survey , 2015, IEEE Transactions on Neural Networks and Learning Systems.