Large Scale Classification in Deep Neural Network with Label Mapping

In recent years, deep neural network is widely used in machine learning. The multi-class classification problem is a class of important problem in machine learning. However, in order to solve those types of multi-class classification problems effectively, the required network size should have hyper-linear growth with respect to the number of classes. Therefore, it is infeasible to solve the multi-class classification problem using deep neural network when the number of classes are huge. This paper presents a method, so called Label Mapping (LM), to solve this problem by decomposing the original classification problem to several smaller sub-problems which are solvable theoretically. Our method is an ensemble method like error-correcting output codes (ECOC), but it allows base learners to be multi-class classifiers with different number of class labels. We propose two design principles for LM, one is to maximize the number of base classifier which can separate two different classes, and the other is to keep all base learners to be independent as possible in order to reduce the redundant information. Based on these principles, two different LM algorithms are derived using number theory and information theory. Since each base learner can be trained independently, it is easy to scale our method into a large scale training system. Experiments show that our proposed method outperforms the standard one-hot encoding and ECOC significantly in terms of accuracy and model complexity.

[1]  Mohamed Cheriet,et al.  Adaptive Error-Correcting Output Codes , 2013, IJCAI.

[2]  Gholam Ali Montazer,et al.  Improving multiclass classification using neighborhood search in error correcting output codes , 2017, Pattern Recognit. Lett..

[3]  Lekh R. Vermani Elements of Algebraic Coding Theory , 1996 .

[4]  Rayid Ghani,et al.  Using Error-Correcting Codes for Efficient Text Cla ssification with a Large Number of Categories , 2001 .

[5]  Jordi Vitrià,et al.  Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Paolo Frasconi,et al.  New results on error correcting output codes of kernel machines , 2004, IEEE Transactions on Neural Networks.

[7]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[8]  Sergio Escalera,et al.  Separability of ternary codes for sparse designs of error-correcting output codes , 2009, Pattern Recognit. Lett..

[9]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[10]  B. Riemann,et al.  Ueber die Anzahl der Primzahlen unter einer gegebenen Grösse , 2013 .

[11]  Xiaohong Chen,et al.  Maximum Margin Tree Error Correcting Output Codes , 2016, PRICAI.

[12]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[13]  Sergio Escalera,et al.  ECOC-ONE: A Novel Coding and Decoding Strategy , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[14]  T. Apostol Introduction to analytic number theory , 1976 .

[15]  John Langford,et al.  Sensitive Error Correcting Output Codes , 2005, COLT.

[16]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[17]  Cheng-Lin Liu,et al.  Error-correcting output codes based ensemble feature extraction , 2013, Pattern Recognit..

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[20]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[21]  Adam L. Berger,et al.  ERROR-CORRECTING OUTPUT CODING FOR TEXT CLASSIFICATION , 1999 .

[22]  Richard C. Singleton,et al.  Maximum distance q -nary codes , 1964, IEEE Trans. Inf. Theory.

[23]  Kaizhu Huang,et al.  Joint learning of error-correcting output codes and dichotomizers from data , 2011, Neural Computing and Applications.

[24]  I. Reed,et al.  Polynomial Codes Over Certain Finite Fields , 1960 .

[25]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[26]  Eleonora Guerrini A classification of MDS binary systematic codes , 2006 .

[27]  Sergio Escalera,et al.  An incremental node embedding technique for error correcting output codes , 2008, Pattern Recognit..

[28]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Kenneth W. Shum,et al.  Deep Representation Learning with Target Coding , 2015, AAAI.

[30]  Bingbing Ni,et al.  Zero-Shot Action Recognition with Error-Correcting Output Codes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).