An Improved Feature Subset Selection Method Using Clusters

Feature extraction is the process of eliminating the irrelevant information and redundant features during Data Mining. Of all the existing feature subset selection algorithms, most of them can effectively eliminate irrelevant features but fail to handle redundant features. The Improved FAST eliminates irrelevant features first and from the result set it removes the redundant features. The Improved FAST method accomplishes four tasks. During the first step the irrelevant features are removed. In the second task features are divided into cluster that posses features with redundant features. The third task accomplishes the selection of most representative feature that is closely related to the target classes. It is selected from each cluster to form the features subset which is the fourth task. The efficiency of the algorithm is improved by applying the search method and their relevant search time is minimized. The Improved FAST algorithm is evaluated using various types of data like text data, micro-array data and image data to represent its performance.

[1]  Aoying Zhou,et al.  Feature Selection Based on a New Dependency Measure , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[2]  Huan Liu,et al.  Feature Selection Using Consistency Measure , 1999, Discovery Science.

[3]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[4]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[5]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[6]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[7]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[8]  Huan Liu,et al.  Feature Selection with Selective Sampling , 2002, International Conference on Machine Learning.

[9]  Bisai Bao,et al.  A Novel Relief Feature Selection Algorithm Based on Mean-Variance Model , 2011 .

[10]  Hyuk-Chul Kwon,et al.  Extended Relief Algorithms in Instance-Based Feature Filtering , 2007, Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007).

[11]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[12]  Sunita Beniwal,et al.  Classification and Feature Selection Techniques in Data Mining , 2012 .

[13]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[14]  Huan Liu,et al.  Searching for interacting features in subset selection , 2009, Intell. Data Anal..

[15]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.