Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection

In this paper, a cost-sensitive learning algorithm is developed to train hierarchical tree classifiers for large-scale image classification application (i.e., categorizing large-scale images into thousands of object classes). A visual tree is first constructed for organizing large numbers of object classes hierarchically and identifying inter-related learning tasks automatically. For the fine-grained object classes at the sibling leaf nodes, they share significant common visual properties but still contain subtle visual differences, thus a multi-task structural learning algorithm is developed to train their inter-related classifiers jointly to enhance their discrimination power. For the coarse-grained categories (i.e., groups of visually similar object classes) at the sibling non-leaf nodes, a hierarchical learning algorithm is developed to leverage tree structure (by adding two inter-level constraints) to train their inter-related classifiers jointly and control inter-level error propagation effectively. To achieve more robust detection of large numbers of object classes, a visual forest is learned by combining multiple visual trees (for different configurations) and their hierarchical tree classifiers. By penalizing various types of misclassification errors differently, a cost-sensitive learning approach is further developed to detect the appearances of new object classes accurately, and an incremental learning algorithm is developed to achieve more effective training of the discriminative classifiers for new object classes. Our experimental results have demonstrated that our cost-sensitive hierarchical learning algorithm can achieve very competitive results on both classification accuracy and computational efficiency as compared with other state-of-the-art techniques. HighlightsVisual tree to organize large-scale object classes hierarchically and determine inter-related learning tasks automatically.Multi-task structural learning for joint classifier training to enhance their discrimination power significantly.Hierarchical learning to leverage inter-level constraints for classifier training and limiting inter-level error propagation.Task and tree parallelism to scale up our hierarchical learning algorithm for large-scale image classification.Cost-sensitive learning and incremental learning for training and detecting for new object classes more effectively.

[1]  Luc Van Gool,et al.  Moment invariants for recognition under changing viewpoint and illumination , 2004, Comput. Vis. Image Underst..

[2]  Jianping Fan,et al.  Structured Max-Margin Learning for Inter-Related Classifier Training and Multilabel Image Annotation , 2011, IEEE Transactions on Image Processing.

[3]  Daphne Koller,et al.  Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[4]  Jianfu Chen,et al.  Cost-sensitive learning for large-scale hierarchical classification , 2013, CIKM.

[5]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[6]  Xiaotong Shen,et al.  On Large Margin Hierarchical Classification With Multiple Paths , 2009, Journal of the American Statistical Association.

[7]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[8]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[9]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  Cordelia Schmid,et al.  Towards good practice in large-scale learning for image classification , 2012, CVPR.

[11]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[12]  Jianping Fan,et al.  Hierarchical classification for automatic image annotation , 2007, SIGIR.

[13]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[14]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[16]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[17]  Shirish Tatikonda,et al.  Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML , 2014, Proc. VLDB Endow..

[18]  Jordan L. Boyd-Graber,et al.  Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce , 2012, WWW.

[19]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[20]  Qiang Yang,et al.  Deep classification in large-scale text hierarchies , 2008, SIGIR '08.

[21]  ZhangJ.,et al.  Local Features and Kernels for Classification of Texture and Object Categories , 2007 .

[22]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[23]  Aly A. Farag,et al.  CSIFT: A SIFT Descriptor with Color Invariant Characteristics , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[25]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[26]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[27]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[28]  Jitendra Malik,et al.  Matching Shapes , 2001, ICCV.

[29]  Mikhail Belkin,et al.  Consistency of spectral clustering , 2008, 0804.0678.

[30]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[32]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[33]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Thomas Rauber,et al.  Parallel Programming: for Multicore and Cluster Systems , 2010, Parallel Programming, 3rd Ed..

[35]  Ohad Shamir,et al.  Probabilistic Label Trees for Efficient Large Scale Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Matt J. Kusner,et al.  Cost-Sensitive Tree of Classifiers , 2012, ICML.

[37]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[38]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[40]  Cordelia Schmid,et al.  Constructing Category Hierarchies for Visual Recognition , 2008, ECCV.

[41]  Motoaki Kawanabe,et al.  On Taxonomies for Multi-class Image Categorization , 2012, International Journal of Computer Vision.

[42]  Gunnar Rätsch,et al.  Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation , 2011, NIPS.

[43]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[44]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[45]  Pietro Perona,et al.  Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[47]  Inderjit S. Dhillon,et al.  Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[48]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[50]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[51]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[52]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[54]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[55]  Pietro Perona,et al.  Unsupervised learning of visual taxonomies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[57]  Alexander J. Smola,et al.  Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising , 2014, WSDM.

[58]  Alexei A. Efros,et al.  Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[60]  Jianping Fan,et al.  Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation , 2008, IEEE Transactions on Image Processing.

[61]  Colin Campbell,et al.  A Linear Programming Approach to Novelty Detection , 2000, NIPS.

[62]  Alexander S. Szalay,et al.  Very Fast Outlier Detection in Large Multidimensional Data Sets , 2002, DMKD.

[63]  Daphna Weinshall,et al.  Hierarchical Regularization Cascade for Joint Learning , 2013, ICML.

[64]  Shimon Ullman,et al.  Cross-generalization: learning novel classes from a single example by feature replacement , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[65]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[66]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[67]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[68]  Roberto J. Bayardo,et al.  PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce , 2009, Proc. VLDB Endow..

[69]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[70]  Guojun Lu,et al.  Review of shape representation and description techniques , 2004, Pattern Recognit..

[71]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[72]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[73]  Jianping Fan,et al.  Quantitative Characterization of Semantic Gaps for Learning Complexity Estimation and Inference Model Selection , 2012, IEEE Transactions on Multimedia.

[74]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[75]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[76]  Jianping Fan,et al.  Mining Multilevel Image Semantics via Hierarchical Classification , 2008, IEEE Transactions on Multimedia.

[77]  Bin Zhao,et al.  Sparse Output Coding for Large-Scale Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.