Optimized Feature Selection Algorithm for High Dimensional Data

Objectives: This research paper, based on fuzzy entropy, adapts a new method along with firefly concept, seeks to select quality features. At the same time it removes redundant and irrelevant attributes in high dimensional data. Methods/Statistical Analysis: Feature selection can be understood as a data prepossessing method in order to reduce dimensionality, eliminate irrelevant data and sharpening of accuracy. In the pattern space, fuzzy entropy is used to estimate the knowledge of pattern distribution. The study of the lightning quality of the fireflies has led to the introduction of the Firefly Algorithm for computing models. This work proposes an algorithm for selecting features by integrating fuzzy entropy and firefly algorithm. Our proposed algorithm's performances are analyzed using four different high dimensional data sets WILT, ORL, LC and LTG. Findings: The algorithm which is introduced here is further experimented with four variant data sets and the results shows that this algorithm out performs the traditional feature selection method. Also our proposed algorithm achieves maximum relevance and minimum level of redundancy. The performance metrics such as sensitivity, specificity and accuracy gives significant improvement when compared with existing FCBF algorithm. Applications/Improvements: Our optimized proposed algorithm efficiently improves the performance by eliminating redundant, noisy and insignificant features and can be applied on all high dimensional data sets.

[1]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[2]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  B. AzhaguSundari,et al.  FEATURE SELECTION BASED ON FUZZY ENTROPY , 2013 .

[4]  Rana Pir Intrusion Detection System Methodologies Based On Data Analysis , 2015 .

[5]  Bart Kosko,et al.  Fuzzy entropy and conditioning , 1986, Inf. Sci..

[6]  S. Gayathri,et al.  A Shared Nearest Neighbour Density based Clustering Approach on a Proclus Method to Cluster High Dimensional Data , 2015 .

[7]  Antony Selvadoss Thanamani,et al.  An Efficient Feature Selection Technique using Supervised Fuzzy Information Theory , 2014 .

[8]  Clifford T. Brown,et al.  Lévy Flights in Dobe Ju/’hoansi Foraging Patterns , 2007 .

[9]  Xin-She Yang,et al.  Cuckoo Search via Lévy flights , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[10]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[11]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Kun-Ming Yu,et al.  Protocol-based classification for intrusion detection , 2008 .

[14]  Jeffrey Horn,et al.  Handbook of evolutionary computation , 1997 .

[15]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[16]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[17]  Qiang Shen,et al.  Finding Rough Set Reducts with Ant Colony Optimization , 2003 .

[18]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[19]  K. Thangavel,et al.  Dimensionality reduction based on rough set theory: A review , 2009, Appl. Soft Comput..

[20]  Qiang Shen,et al.  Rough set-aided keyword reduction for text categorization , 2001, Appl. Artif. Intell..

[21]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[22]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[23]  Barbara Webb,et al.  Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..

[24]  Xin-She Yang,et al.  Firefly Algorithms for Multimodal Optimization , 2009, SAGA.

[25]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[26]  Xin-She Yang,et al.  Nature-Inspired Metaheuristic Algorithms , 2008 .

[27]  Xin-She Yang,et al.  Firefly Algorithm: Recent Advances and Applications , 2013, ArXiv.

[28]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[29]  Richard E. Blahut,et al.  Principles and practice of information theory , 1987 .

[30]  Matjaz Kukar,et al.  Image processing and machine learning for fully automated probabilistic evaluation of medical images , 2011, Comput. Methods Programs Biomed..

[31]  Hamid Parvin,et al.  An Innovative Feature Selection Using Fuzzy Entropy , 2011, ISNN.

[32]  K. Thanushkodi,et al.  A Novel Rough Set Reduct Algorithm for Medical Domain Based on Bee Colony Optimization , 2010, ArXiv.

[33]  Xin-She Yang,et al.  Firefly Algorithm, Lévy Flights and Global Optimization , 2010, SGAI Conf..

[34]  Andrew K. C. Wong,et al.  Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  A. Al-Ani,et al.  Novel feature extraction method based on fuzzy entropy and wavelet packet transform for myoelectric Control , 2007, 2007 International Symposium on Communications and Information Technologies.

[36]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[37]  Yiyu Yao,et al.  Discernibility matrix simplification for constructing attribute reducts , 2009, Inf. Sci..

[38]  Ajith Abraham,et al.  A New Rough Set Reduct Algorithm Based on Particle Swarm Optimization , 2007, IWINAC.

[39]  Mohan M. Trivedi,et al.  Low-Level Segmentation of Aerial Images with Fuzzy Clustering , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[40]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[41]  Laveen N. Kanal,et al.  Classification, Pattern Recognition and Reduction of Dimensionality , 1982, Handbook of Statistics.

[42]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[43]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[44]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[45]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[46]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[47]  R. Harikrishnan,et al.  Application of Intelligent Firefly Algorithm to Solve OPF with STATCOM , 2015 .

[48]  Mill Johannes G.A. Van,et al.  Transmission Of Information , 1961 .

[49]  A. Meyer-Bäse Feature Selection and Extraction , 2004 .

[50]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[51]  Basabi Chakraborty,et al.  Fuzzy Set Theoretic Measure for Automatic Feature Evaluation , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[52]  Shyi-Ming Chen,et al.  Feature subset selection based on fuzzy entropy measures for handling classification problems , 2008, Applied Intelligence.

[53]  Hema Banati,et al.  Fire Fly Based Feature Selection Approach , 2011 .

[54]  Ajith Abraham,et al.  Nature Inspired Population-Based Heuristics for Rough Set Reduction , 2009 .

[55]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.