论文信息 - Classification model with subspace data-dependent balls

Classification model with subspace data-dependent balls

Data-Dependent Ball (DDB) is a pre-processing algorithm that transforms quantitative into binary data by mapping them into a set of balls. In datasets with large number of dimensions, data-dependent balls are less significant due to the distance calculation in the mapping process. To reduce number of ball dimensions, this paper proposes a method for subspace data-dependent balls (SDDB) generation. SDDB starts by ranking features using information gain, and then eliminating input features based on ratio r. Subspace data-dependent balls are then created and filtered out with respect to their size and purity. Finally, a C4.5 decision tree classification model is constructed using subspace data-dependent balls as features. Experimental results from 8 TICI datasets show that the accuracy from a combination of SDDB and C4.5 is better than the combination of DDB and C4.5 in terms of accuracy.

[1] W. Marsden. I and J , 2012 .

[2] Hamid Parvin,et al. Nearest Cluster Classifier , 2012, HAIS.

[3] Wei-Pang Yang,et al. A discretization algorithm based on Class-Attribute Contingency Coefficient , 2008, Inf. Sci..

[4] Lukasz A. Kurgan,et al. CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5] Mario Marchand,et al. Learning with Decision Lists of Data-Dependent Features , 2005, J. Mach. Learn. Res..

[6] Naveen Kumar,et al. Data Mining for Business Intelligence–Concepts, Techniques, and Applications in Microsoft Office Excel® with XLMiner® , 2012 .

[7] Aaas News,et al. Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[8] Baoping Yan,et al. Mining Quantitative Association Rules on Overlapped Intervals , 2005, ADMA.

[9] Korris Fu-Lai Chung,et al. Building a Decision Cluster Classification Model for High Dimensional Data by a Variable Weighting k-Means Method , 2008, Australasian Conference on Artificial Intelligence.

[10] R. Stephenson. A and V , 1962, The British journal of ophthalmology.

[11] Siu-Ming Yiu,et al. An efficient algorithm for finding dense regions for mining quantitative association rules , 2005 .

[12] S. Kotsiantis,et al. Discretization Techniques: A recent survey , 2006 .

[13] Geoffrey Holmes,et al. Clustering for classification , 2011, 2011 7th International Conference on Information Technology in Asia.

[14] Quantitative Association Rules Based on Distance , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[15] Korris Fu-Lai Chung,et al. A subspace decision cluster classifier for text classification , 2011, Expert Syst. Appl..

[16] John Shawe-Taylor,et al. The Set Covering Machine , 2003, J. Mach. Learn. Res..

[17] John Shawe-Taylor,et al. The Set Covering Machine with Data-Dependent Half-Spaces , 2003, ICML.

[18] Usama M. Fayyad,et al. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[19] Daniel T. Larose,et al. Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .