Classification model with subspace data-dependent balls

Data-Dependent Ball (DDB) is a pre-processing algorithm that transforms quantitative into binary data by mapping them into a set of balls. In datasets with large number of dimensions, data-dependent balls are less significant due to the distance calculation in the mapping process. To reduce number of ball dimensions, this paper proposes a method for subspace data-dependent balls (SDDB) generation. SDDB starts by ranking features using information gain, and then eliminating input features based on ratio r. Subspace data-dependent balls are then created and filtered out with respect to their size and purity. Finally, a C4.5 decision tree classification model is constructed using subspace data-dependent balls as features. Experimental results from 8 TICI datasets show that the accuracy from a combination of SDDB and C4.5 is better than the combination of DDB and C4.5 in terms of accuracy.

[1]  W. Marsden I and J , 2012 .

[2]  Hamid Parvin,et al.  Nearest Cluster Classifier , 2012, HAIS.

[3]  Wei-Pang Yang,et al.  A discretization algorithm based on Class-Attribute Contingency Coefficient , 2008, Inf. Sci..

[4]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5]  Mario Marchand,et al.  Learning with Decision Lists of Data-Dependent Features , 2005, J. Mach. Learn. Res..

[6]  Naveen Kumar,et al.  Data Mining for Business Intelligence–Concepts, Techniques, and Applications in Microsoft Office Excel® with XLMiner® , 2012 .

[7]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[8]  Baoping Yan,et al.  Mining Quantitative Association Rules on Overlapped Intervals , 2005, ADMA.

[9]  Korris Fu-Lai Chung,et al.  Building a Decision Cluster Classification Model for High Dimensional Data by a Variable Weighting k-Means Method , 2008, Australasian Conference on Artificial Intelligence.

[10]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[11]  Siu-Ming Yiu,et al.  An efficient algorithm for finding dense regions for mining quantitative association rules , 2005 .

[12]  S. Kotsiantis,et al.  Discretization Techniques: A recent survey , 2006 .

[13]  Geoffrey Holmes,et al.  Clustering for classification , 2011, 2011 7th International Conference on Information Technology in Asia.

[14]  Quantitative Association Rules Based on Distance , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[15]  Korris Fu-Lai Chung,et al.  A subspace decision cluster classifier for text classification , 2011, Expert Syst. Appl..

[16]  John Shawe-Taylor,et al.  The Set Covering Machine , 2003, J. Mach. Learn. Res..

[17]  John Shawe-Taylor,et al.  The Set Covering Machine with Data-Dependent Half-Spaces , 2003, ICML.

[18]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[19]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .