论文信息 - A clustering rule-based approach to predictive modeling

A clustering rule-based approach to predictive modeling

Recent discoveries using rule-based classifiers and pre-learning data clustering have helped improve classification accuracy in predictive modeling tasks. This research introduces a unique approach which combines the above techniques and studies its predictive effects. The algorithm presented in this research, a Clustering Rule-based Algorithm (CRA), first clusters the original training set using an Expectation Maximization (EM) algorithm. Then, a separate Classification and Regression Tree (CART) is trained on each individual cluster. To obtain an upper-bound on accuracy, each test instance is evaluated against all of the rules produced by each separate Tree, to determine if there exists a rule produced by one of the Trees which correctly classifies the test instance. This study reveals that a predictive accuracy of 100% was achievable. Moreover, this approach exploits the advantages of supervised and unsupervised learning to produce a more powerful and more accurate predictive model.

Juan E. Gilbert | Caio Soares | Philicity Williams

[1] William W. Cohen. Fast Effective Rule Induction , 1995, ICML.

[2] M. Jambu,et al. Cluster analysis and data analysis , 1985 .

[3] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[4] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[5] Michael R. Anderberg,et al. Cluster Analysis for Applications , 1973 .

[6] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[7] Joydeep Ghosh,et al. A framework for simultaneous co-clustering and learning from complex data , 2007, KDD '07.

[8] Donald K. Wedding,et al. Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[9] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[10] Alex Berson,et al. Building Data Mining Applications for CRM , 1999 .

[11] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[12] Osmar R. Zaïane,et al. Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13] Jian Pei,et al. CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14] Hong Hu,et al. Using Association Rules to Make Rule-based Classifiers Robust , 2005, ADC.

[15] Yoh-Han Pao,et al. Unsupervised/supervised learning concept for 24-hour load forecasting , 1993 .

[16] James V. Rauff. Data Mining: A Tutorial-Based Primer , 2005 .

[17] Jianyong Wang,et al. HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[18] Anthony K. H. Tung,et al. FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.

[19] R. Mike Cameron-Jones,et al. FOIL: A Midterm Report , 1993, ECML.

[20] Pavel Berkhin,et al. A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[21] Anirban Dasgupta,et al. Approximation algorithms for co-clustering , 2008, PODS.

[22] Jiawei Han,et al. CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[23] Hong Shen,et al. Construct robust rule sets for classification , 2002, KDD.

[24] Sholom M. Weiss,et al. Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[25] A. J. Germond,et al. Application of the Kohonen network to short-term load forecasting , 1993, [1993] Proceedings of the Second International Forum on Applications of Neural Networks to Power Systems.

[26] R. Lewis. An Introduction to Classification and Regression Tree (CART) Analysis , 2000 .