THEORETICAL FOUNDATIONS AND EXPERIMENTAL RESULTS FOR A HIERARCHICAL CLASSIFIER WITH OVERLAPPING CLUSTERS

This paper proposes a classification framework based on simple classifiers organized in a tree‐like structure. It is observed that simple classifiers, even though they have high error rate, find similarities among classes in the problem domain. The authors propose to trade on this property by recognizing classes that are mistaken and constructing overlapping subproblems. The subproblems are then solved by other classifiers, which can be very simple, giving as a result a hierarchical classifier (HC). It is shown that HC, together with the proposed training algorithm and evaluation methods, performs well as a classification framework. It is also proven that such constructs give better accuracy than the root classifier it is built upon.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  David Casasent,et al.  A hierarchical classifier using new support vector machines for automatic target recognition , 2005, Neural Networks.

[3]  Kagan Tumer,et al.  Input Decimation Ensembles: Decorrelation through Dimensionality Reduction , 2001, Multiple Classifier Systems.

[4]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[8]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[9]  Joydeep Ghosh,et al.  Hierarchical Fusion of Multiple Classifiers for Hyperspectral Data Analysis , 2002, Pattern Analysis & Applications.

[10]  Igor T. Podolak,et al.  A Hierarchical Classifier with Growing Neural Gas Clustering , 2009, ICANNGA.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[14]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[15]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Minimum Spanning Trees in Hierarchical Multiclass Support Vector Machines Generation , 2005, IEA/AIE.

[16]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[17]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[18]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[20]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[21]  Andreas Stafylopatis,et al.  A divide-and-conquer method for multi-net classifiers , 2003, Pattern Analysis & Applications.

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[24]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[25]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[26]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[27]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[28]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[29]  Bernd Fritzke,et al.  A Self-Organizing Network that Can Follow Non-stationary Distributions , 1997, ICANN.

[30]  Adam Roman,et al.  A New Notion of Weakness in Classification Theory , 2009, Computer Recognition Systems 3.

[31]  Jian-Bo Yang,et al.  Hierarchical Maximum Margin Learning for Multi-Class Classification , 2011, UAI.

[32]  Rudy Setiono,et al.  Feedforward Neural Network Construction Using Cross Validation , 2001, Neural Computation.

[33]  Günther Eibl,et al.  Multiclass Boosting for Weak Classifiers , 2005, J. Mach. Learn. Res..

[34]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[35]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[36]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[37]  Igor T. Podolak Hierarchical classifier with overlapping class groups , 2008, Expert Syst. Appl..

[38]  Ke Chen,et al.  Combining linear discriminant functions with neural networks for supervised learning , 2005, Neural Computing & Applications.

[39]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.