Improving Supervised Learning by Sample Decomposition

This paper introduces a new ensemble technique, cluster-based concurrent decomposition (CBCD) that induces an ensemble of classifiers by decomposing the training set into mutually exclusive sub-samples of equal-size. The CBCD algorithm first clusters the instance space by using the K-means clustering algorithm. Afterwards it produces disjoint sub-samples using the clusters in such a way that each sub-sample is comprised of tuples from all clusters and hence represents the entire dataset. An induction algorithm is applied in turn to each subset, followed by a voting mechanism that combines the classifier's predictions. The CBCD algorithm has two tuning parameters: the number of clusters and the number of subsets to create. Using a suitable meta-learning it is possible to tune these parameters properly. In the experimental study we conducted, the CBCD algorithm, using an embedded C4.5 algorithm, outperformed the bagging algorithm of the same computational complexity.

[1]  Ian Sinclair,et al.  Designing Committees of Models through Deliberate Weighting of Data Points , 2003, J. Mach. Learn. Res..

[2]  João Gama,et al.  Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[3]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[4]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[5]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[6]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[7]  Michael J. Pazzani,et al.  Error reduction through learning multiple descriptions , 2004, Machine Learning.

[8]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[9]  Kurt Hornik,et al.  A Cluster Ensembles Framework , 2003, HIS.

[10]  Salvatore J. Stolfo,et al.  Toward parallel and distributed learning by meta-learning , 1993 .

[11]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[12]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[13]  M. Pazzani,et al.  Error Reduction through Learning Multiple Descriptions , 1996, Machine Learning.

[14]  Tim Oates,et al.  Large Datasets Lead to Overly Complex Models: An Explanation and a Solution , 1998, KDD.

[15]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[16]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[17]  Volker Tresp,et al.  Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[18]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[19]  Paul W. Munro,et al.  Improving Committee Diagnosis with Resampling Techniques , 1995, NIPS.

[20]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[21]  Russell L. Purvis,et al.  Forecasting the NYSE composite index with technical analysis, pattern recognizer, neural network, and genetic algorithm: a case study in romantic decision support , 2002, Decis. Support Syst..

[22]  Xiaohua Hu,et al.  Cluster Ensemble and Its Applications in Gene Expression Analysis , 2004, APBC.

[23]  A. Sharkey Linear and Order Statistics Combiners for Pattern Classification , 1999 .

[24]  David West,et al.  Model selection for medical diagnosis decision support systems , 2004, Decis. Support Syst..

[25]  Yves Deville,et al.  Multi-class protein fold classification using a new ensemble machine learning approach. , 2003, Genome informatics. International Conference on Genome Informatics.

[26]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[27]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[28]  Lorenzo Bruzzone,et al.  Detection of land-cover transitions by combining multidate classifiers , 2004, Pattern Recognit. Lett..

[29]  Pedro M. Domingos Using Partitioning to Speed Up Specific-to-General Rule Induction , 1996 .

[30]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..