The Utility of Clustering in Prediction Tasks

We explore the utility of clustering in reducing error in various prediction tasks. Previous work has hinted at the improvement in prediction accuracy attributed to clustering algorithms if used to pre-process the data. In this work we more deeply investigate the direct utility of using clustering to improve prediction accuracy and provide explanations for why this may be so. We look at a number of datasets, run k-means at different scales and for each scale we train predictors. This produces k sets of predictions. These predictions are then combined by a na\"ive ensemble. We observed that this use of a predictor in conjunction with clustering improved the prediction accuracy in most datasets. We believe this indicates the predictive utility of exploiting structure in the data and the data compression handed over by clustering. We also found that using this method improves upon the prediction of even a Random Forests predictor which suggests this method is providing a novel, and useful source of variance in the prediction process.

[1]  Sally Floyd,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 2004, Machine Learning.

[2]  Zachary A. Pardos,et al.  Spectral Clustering in Educational Data Mining , 2011, EDM.

[3]  William Bialek,et al.  How Many Clusters? An Information-Theoretic Perspective , 2003, Neural Computation.

[4]  Max A. Little,et al.  Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests , 2009 .

[5]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[6]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[7]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[8]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[9]  W. Marsden I and J , 2012 .

[10]  Max A. Little,et al.  Accurate Telemonitoring of Parkinson's Disease Progression by Noninvasive Speech Tests , 2009, IEEE Transactions on Biomedical Engineering.

[11]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[12]  Temple F. Smith Occam's razor , 1980, Nature.

[13]  John Langford,et al.  PAC-MDL Bounds , 2003, COLT.

[14]  Joydeep Ghosh,et al.  A framework for simultaneous co-clustering and learning from complex data , 2007, KDD '07.

[15]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[16]  N. Heffernan,et al.  Using HMMs and bagged decision trees to leverage rich features of user and skill from an intelligent tutoring system dataset , 2010 .

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Pedro M. Domingos,et al.  Statistical predicate invention , 2007, ICML '07.

[19]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[20]  Zachary A. Pardos,et al.  Clustering Students to Generate an Ensemble to Improve Standard Test Score Predictions , 2011, AIED.

[21]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[22]  P. Cortez,et al.  A data mining approach to predict forest fires using meteorological data , 2007 .

[23]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[24]  L. Goddard Information Theory , 1962, Nature.

[25]  John Langford,et al.  An objective evaluation criterion for clustering , 2004, KDD.

[26]  Sung-Bae Cho,et al.  Adaptive mixture-of-experts models for data glove interface with multiple users , 2012, Expert Syst. Appl..

[27]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.