Learning by abstraction: Hierarchical classification model using evidential theoretic approach and Bayesian ensemble model

Abstraction is one of the powerful basic techniques for solving complex problems. In this paper we use abstraction along with hierarchical learning to propose a new classification model which is called ''Learning by Abstraction (LA)''. The key idea in LA is to apply both supervised and unsupervised learning algorithms for solving complex classification problems. In addition, the proposed model can be useful in semi-supervised learning problems in which we just know the high level category of some training instances. In the learning mode of the proposed model, we find the nearest classes and merge them into a new abstract class. We call the collection of this new abstract class with other existing classes a new abstract level of learning. Then, a new learner is trained to perform the classification task in this abstract level. In the recall mode, in order to classify a new instance we combine the decision of these classifiers using a new classifier ensemble model based on Dempster-Shafer's theory and Bayesian ensemble model. The simulation study results show that the proposed model has two major advantages. First, it can improve the correct classification rate (CCR) of an ordinary classifier, especially in complex classification tasks with high dimensional feature vector and many target classes. Second, the new model is robust to the noise and the rate of CCR improvement of the proposed model increases as the noise level of data goes up. In addition, the proposed model has been examined on a real data set of protein fold pattern recognition problem in which the correct classification rate of the RBF neural network has been improved by about 10%.

[1]  Giorgio Valentini,et al.  Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction , 2010, MLSB.

[2]  Jihoon Kim,et al.  Calibrating predictive model estimates to support personalized medicine , 2011, J. Am. Medical Informatics Assoc..

[3]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[4]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[5]  Younès Bennani,et al.  Dendogram based SVM for multi-class classification , 2006, 28th International Conference on Information Technology Interfaces, 2006..

[6]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[7]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[10]  Joydeep Ghosh,et al.  Hierarchical Fusion of Multiple Classifiers for Hyperspectral Data Analysis , 2002, Pattern Analysis & Applications.

[11]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[12]  Azadeh Shakery,et al.  Protein Fold Pattern Recognition Using Bayesian Ensemble of RBF Neural Networks , 2009, 2009 International Conference of Soft Computing and Pattern Recognition.

[13]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[14]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[15]  O. Troyanskaya,et al.  Predicting gene function in a hierarchical context with an ensemble of classifiers , 2008, Genome Biology.

[16]  Claudio Gentile,et al.  Regret Bounds for Hierarchical Classification with Linear-Threshold Functions , 2004, COLT.

[17]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[18]  Pedro M. Domingos A Unified Bias-Variance Decomposition for Zero-One and Squared Loss , 2000, AAAI/IAAI.

[19]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[20]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[21]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[22]  David G. Stork,et al.  Pattern Classification , 1973 .

[23]  Giorgio Valentini,et al.  Integration of heterogeneous data sources for gene function prediction using decision templates and ensembles of learning machines , 2010, Neurocomputing.

[24]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[25]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[26]  Loris Nanni,et al.  Ensemblator: An ensemble of classifiers for reliable classification of biological data , 2007, Pattern Recognit. Lett..

[27]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[28]  Sung-Bae Cho,et al.  Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features , 2002, Proc. IEEE.

[29]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[30]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[31]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.