Prediction-based classification using learning on Riemannian manifolds

This paper is concerned with learning from predictions. Predictions are obtained by ensemble of classifiers such as random forests (RF) or extra-trees. One assumes that estimators are semi independent so that they can be considered as prediction space. Hence we project our feature vector to the space of estimators obtaining responses from each of them. The responses for RFs are conditional class probabilities. The responses might be considered as projections onto some direction in quasi-orthogonal space which are decision trees of a RF. After that one creates the connected Riemannian manifold by computing a matrix of pairwise products of predictions for all trees in the RF. These matrices are symmetric and positive definite which is a necessary and sufficient condition to have a connected Riemannian manifold (R manifold). Because outputs of trees are conditional probabilities we have to create as many such matrices as there are classes. Stacking all these matrices together we obtain a tensor which is passed to Convolutional Neural Networks (CNN) for learning. We tested our algorithm on 11 datasets from UCI repository representing difficult classification problems. The results show very fast learning and convergence of loss and prediction accuracy. The proposed algorithm outperforms feature-based classical classifier ensembles (RFs and extra-trees) for every tested dataset from UCI repository.

[1]  Peter Kontschieder,et al.  Decision Forests, Convolutional Networks and the Models in-Between , 2016, ArXiv.

[2]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ludmila I. Kuncheva,et al.  A Bound on Kappa-Error Diagrams for Analysis of Classifier Ensembles , 2013, IEEE Transactions on Knowledge and Data Engineering.

[5]  H. Karcher Riemannian center of mass and mollifier smoothing , 1977 .

[6]  K. Vorontsov Splitting and similarity phenomena in the sets of classifiers and their effect on the probability of overfitting , 2009, Pattern Recognition and Image Analysis.

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Shiguang Shan,et al.  Cross Euclidean-to-Riemannian Metric Learning with Application to Face Recognition from Video , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[12]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Adam Krzyzak,et al.  Some Properties of Consensus-Based Classification , 2017, CORES.

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[17]  Luiz Eduardo Soares de Oliveira,et al.  Pairwise fusion matrix for combining classifiers , 2007, Pattern Recognit..

[18]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.