MF-Tree: Matrix Factorization Tree for Large Multi-Class Learning

Many big data applications require accurate classification of objects into one of possibly thousands or millions of categories. Such classification tasks are challenging due to issues such as class imbalance, high testing cost, and model interpretability problems. To overcome these challenges, we propose a novel hierarchical learning method known as MF-Tree to efficiently classify data sets with large number of classes while simultaneously inducing a taxonomy structure that captures relationships among the classes. Unlike many other existing hierarchical learning methods, our approach is designed to optimize a global objective function. We demonstrate the equivalence between our proposed regularized loss function and the Hilbert-Schmidt Independence Criterion (HSIC). The latter has a nice additive property, which allows us to decompose the multi-class learning problem into hierarchical binary classification tasks. To improve its training efficiency, an approximate algorithm for inducing MF-Tree is also proposed. We performed extensive experiments to compare MF-Tree against several state-of-the-art algorithms and showed both its effectiveness and efficiency when applied to real-world data sets.

[1]  Lei Liu,et al.  LearningAssistant: A novel learning resource recommendation system , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[2]  Wojciech Zaremba,et al.  Taxonomic Prediction with Tree-Structured Covariances , 2013, ECML/PKDD.

[3]  Mohammed Bellalij,et al.  The Trace Ratio Optimization Problem , 2012, SIAM Rev..

[4]  Lei Liu,et al.  Weighted linear kernel with tree transformed features for malware detection , 2012, CIKM '12.

[5]  Korris Fu-Lai Chung,et al.  A trace ratio maximization approach to multiple kernel-based dimensionality reduction , 2014, Neural Networks.

[6]  Rayid Ghani,et al.  Using Error-Correcting Codes for Efficient Text Cla ssification with a Large Number of Categories , 2001 .

[7]  Le Song,et al.  A dependence maximization view of clustering , 2007, ICML '07.

[8]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[9]  Daphne Koller,et al.  Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[10]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[11]  John Langford,et al.  Sensitive Error Correcting Output Codes , 2005, COLT.

[12]  Lei Liu,et al.  A Framework for Co-classification of Articles and Users in Wikipedia , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[13]  Tao Jiang,et al.  Alignment of Trees - An Alternative to Tree Edit , 1994, Theor. Comput. Sci..

[14]  Donghui Chen,et al.  Nonnegativity constraints in numerical analysis , 2009, The Birth of Numerical Analysis.

[15]  A. Beygelzimer Multiclass Classification with Filter Trees , 2007 .

[16]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[17]  Lei Liu,et al.  Detecting malicious clients in ISP networks using HTTP connectivity graph and flow information , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[18]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[19]  Lei Liu,et al.  To print or not to print: hybrid learning with METIS learning platform , 2015, EICS.

[20]  Haesun Park,et al.  Fast rank-2 nonnegative matrix factorization for hierarchical document clustering , 2013, KDD.

[21]  Arthur Gretton,et al.  Learning Taxonomies by Dependence Maximization , 2008, NIPS.

[22]  Lei Liu,et al.  Recursive NMF: Efficient label tree learning for large multi-class problems , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[23]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[25]  Lei Liu,et al.  Image Discovery and Insertion for Custom Publishing , 2015, RecSys Posters.

[26]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[27]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[28]  Lei Liu,et al.  Missing or Inapplicable: Treatment of Incomplete Continuous-valued Features in Supervised Learning , 2013, SDM.

[29]  Lei Liu,et al.  Combining supervised and unsupervised learning for zero-day malware detection , 2013, 2013 Proceedings IEEE INFOCOM.

[30]  Lei Liu,et al.  Generating reading orders over document collections , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[31]  Stanley M. Selkow,et al.  The Tree-to-Tree Editing Problem , 1977, Inf. Process. Lett..