Exploration of a hybrid feature selection algorithm

In the Knowledge Discovery Process, classification algorithms are often used to help create models with training data that can be used to predict the classes of untested data instances. While there are several factors involved with classification algorithms that can influence classification results, such as the node splitting measures used in making decision trees, feature selection is often used as a pre-classification step when using large data sets to help eliminate irrelevant or redundant attributes in order to increase computational efficiency and possibly to increase classification accuracy. One important factor common to both feature selection as well as to classification using decision trees is attribute discretization, which is the process of dividing attribute values into a smaller number of discrete values. In this paper, we will present and explore a new hybrid approach, ChiBlur, which involves the use of concepts from both the blurring and χ2-based approaches to feature selection, as well as concepts from multi-objective optimization. We will compare this new algorithm with algorithms based on the blurring and χ2-based approaches.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[3]  Neelam Verma,et al.  An effective source recognition algorithm: extraction of significant binary words , 2000, Pattern Recognit. Lett..

[4]  Vipin Kumar,et al.  Partitioning-based clustering for Web document categorization , 1999, Decis. Support Syst..

[5]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[7]  S. .,et al.  Feature Selection via Concave Minimization and Support VectorMachinesP , .

[8]  Selwyn Piramuthu Feature Selection for Financial Credit-Risk Evaluation Decisions , 1999, INFORMS J. Comput..

[9]  Abraham Kandel,et al.  Information-theoretic algorithm for feature selection , 2001, Pattern Recognit. Lett..

[10]  William Nick Street,et al.  An Inductive Learning Approach to Prognostic Prediction , 1995, ICML.

[11]  Vijay V. Raghavan,et al.  A comparison of feature selection algorithms in the context of rough classifiers , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[12]  Glenn Fung,et al.  Data selection for support vector machine classifiers , 2000, KDD '00.

[13]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[14]  Kweku-Muata Osei-Bryson,et al.  Optimal techniques for class-dependent attribute discretization , 2001, J. Oper. Res. Soc..

[15]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[16]  Kevin J. Dalton,et al.  Feature selection using expected attainable discrimination , 1998, Pattern Recognit. Lett..

[17]  Ron Kohavi,et al.  Useful Feature Subsets and Rough Set Reducts , 1994 .

[18]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[19]  Sally I. McClean,et al.  A data mining approach to the prediction of corporate failure , 2001, Knowl. Based Syst..

[20]  Sang-Chan Park,et al.  Integrated machine learning approaches for complementing statistical process control procedures , 2000, Decis. Support Syst..

[21]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[22]  Kweku-Muata Bryson,et al.  Comparison of two families of entropy-based classification measures with and without feature selection , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[23]  Maciej Modrzejewski,et al.  Feature Selection Using Rough Sets Theory , 1993, ECML.

[24]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[25]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[26]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[27]  Peng-Yeng Yin,et al.  Content-based retrieval from trademark databases , 2002, Pattern Recognit. Lett..

[28]  Kristin P. Bennett,et al.  Feature minimization within decision trees , 1998 .

[29]  Olvi L. Mangasarian,et al.  Mathematical Programming in Data Mining , 1997, Data Mining and Knowledge Discovery.

[30]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[31]  Ning Zhong,et al.  Using Rough Sets with Heuristics for Feature Selection , 1999, Journal of Intelligent Information Systems.

[32]  Richard Nock,et al.  A Bayesian boosting theorem , 2001, Pattern Recognit. Lett..

[33]  Z. Pawlak Rough set approach to knowledge-based decision support , 1997 .

[34]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[35]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..