A new hybrid feature selection approach using feature association map for supervised and unsupervised classification

Abstract Feature selection, both for supervised as well as for unsupervised classification is a relevant problem pursued by researchers for decades. There are multiple benchmark algorithms based on filter, wrapper and hybrid methods. These algorithms adopt different techniques which vary from traditional search-based techniques to more advanced nature inspired algorithm based techniques. In this paper, a hybrid feature selection algorithm using graph-based technique has been proposed. The proposed algorithm has used the concept of Feature Association Map (FAM) as an underlying foundation. It has used graph-theoretic principles of minimal vertex cover and maximal independent set to derive feature subset. This algorithm applies to both supervised and unsupervised classification. The performance of the proposed algorithm has been compared with several benchmark supervised and unsupervised feature selection algorithms and found to be better than them. Also, the proposed algorithm is less computationally expensive and hence has taken less execution time for the publicly available datasets used in the experiments, which include high-dimensional datasets.

[1]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[4]  Parham Moradi,et al.  Integration of graph clustering with ant colony optimization for feature selection , 2015, Knowl. Based Syst..

[5]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[6]  Yi Pan,et al.  A Feature Selection Algorithm Based on Graph Theory and Random Forests for Protein Secondary Structure Prediction , 2007, ISBRA.

[7]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[8]  José Manuel Benítez,et al.  Empirical study of feature selection methods based on individual feature evaluation for classification problems , 2011, Expert Syst. Appl..

[9]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[10]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[11]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Anirban Mukhopadhyay,et al.  Unsupervised Non-redundant Feature Selection: A Graph-Theoretic Approach , 2013 .

[13]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[14]  Shahzad Bhatti,et al.  Data Clustering and Graph Partitioning via Simulated Mixing , 2016, IEEE Transactions on Network Science and Engineering.

[15]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[18]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[19]  Frank Harary,et al.  Graph Theory , 2016 .

[20]  T. Therneau,et al.  An Introduction to Recursive Partitioning Using the RPART Routines , 2015 .

[21]  Basabi Chakraborty,et al.  An efficient feature selection technique for clustering based on a new measure of feature importance , 2017, J. Intell. Fuzzy Syst..

[22]  Edwin R. Hancock,et al.  Hypergraph based information-theoretic feature selection , 2012, Pattern Recognit. Lett..

[23]  Huan Liu,et al.  Feature Selection for Clustering , 2000, Encyclopedia of Database Systems.

[24]  Madhuri Behari,et al.  Graph‐theory‐based spectral feature selection for computer aided diagnosis of Parkinson's disease using T1‐weighted MRI , 2015, Int. J. Imaging Syst. Technol..

[25]  Feng-Chia Li,et al.  Combination of feature selection approaches with SVM in credit scoring , 2010, Expert Syst. Appl..

[26]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Parham Moradi,et al.  A graph theoretic approach for unsupervised feature selection , 2015, Eng. Appl. Artif. Intell..

[28]  Jugal K. Kalita,et al.  MIFS-ND: A mutual information-based feature selection method , 2014, Expert Syst. Appl..

[29]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[30]  Ujjwal Maulik,et al.  Integration of dense subgraph finding with feature clustering for unsupervised feature selection , 2014, Pattern Recognit. Lett..

[31]  Pabitra Mitra,et al.  Graph based unsupervised feature selection for microarray data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[32]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[33]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[34]  Atsushi Sato,et al.  Feature selection using graph cuts based on relevance and redundancy , 2013, 2013 IEEE International Conference on Image Processing.

[35]  Edwin R. Hancock,et al.  A Graph-Based Approach to Feature Selection , 2011, GbRPR.

[36]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[37]  Amit Kumar Das,et al.  A feature cluster taxonomy based feature selection technique , 2017, Expert Syst. Appl..

[38]  Richard Taylor Interpretation of the Correlation Coefficient: A Basic Review , 1990 .

[39]  Shuicheng Yan,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007 .

[40]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[41]  Amit Kumar Das,et al.  A Graph-Theoretic Approach for Visualization of Data Set Feature Association , 2016, ACSS.

[42]  Narsingh Deo,et al.  Graph Theory with Applications to Engineering and Computer Science , 1975, Networks.

[43]  H. R. Sahebi,et al.  A graph-based feature selection method for improving medical diagnosis , 2015 .

[44]  Alberto Guillén,et al.  Feature selection using mutual information and neural networks , 2006 .

[45]  Jay Lee,et al.  A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis , 2011, Expert Syst. Appl..