MGFS: A multi-label graph-based feature selection algorithm via PageRank centrality

Abstract In multi-label data, each instance corresponds to a set of labels instead of one label whereby the instances belonging to a label in the corresponding column of that label are assigned 1, while instances that do not belong to that label are assigned 0 in the data set. This type of data is usually considered as high-dimensional data, so many methods, using machine learning algorithms, seek to choose the best subset of features for reducing the dimensionality of data and then to create an acceptable model for classification. In this paper, we have designed a fast algorithm for feature selection on the multi-label data using the PageRank algorithm, which is an effective method used to calculate the importance of web pages on the Internet. This algorithm, which is called multi-label graph-based feature selection (MGFS), first constructs an M × L matrix, called Correlation Distance Matrix (CDM), where M is the number of features and L represents the number of class labels. Then, MGFS creates a complete weighted graph, called Feature-Label Graph (FLG), where each feature is considered as a vertex, and the weight between two vertices (or features) represents their Euclidean distance in CDM. Finally, the importance of each graph vertex (or feature) is estimated via the PageRank algorithm. In the proposed method, the number of features can be determined by the user. To prove the performance of the proposed algorithm, we have tested this algorithm with several methods for multi-label feature selection and on several multi-label datasets with different dimensions. The results show the superiority of the proposed method in the classification criteria and run-time.

[1]  Mehdi Rezaeian,et al.  Training spiking neurons with gravitational search algorithm for data classification , 2016, 2016 1st Conference on Swarm Intelligence and Evolutionary Computation (CSIEC).

[2]  Qinghua Hu,et al.  Multi-label Attribute Evaluation Based on Fuzzy Rough Sets , 2014, RSCTC.

[3]  Masoumeh Zare,et al.  Supervised feature selection via matrix factorization based on singular value decomposition , 2019, Chemometrics and Intelligent Laboratory Systems.

[4]  Michel Verleysen,et al.  Feature Selection for Multi-label Classification Problems , 2011, IWANN.

[5]  Shulin Wang,et al.  Feature selection in machine learning: A new perspective , 2018, Neurocomputing.

[6]  Hossein Nezamabadi-pour,et al.  GGSA: A Grouping Gravitational Search Algorithm for data clustering , 2014, Eng. Appl. Artif. Intell..

[7]  Zhiming Luo,et al.  Manifold regularized discriminative feature selection for multi-label learning , 2019, Pattern Recognit..

[8]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[9]  Ping Zhang,et al.  Distinguishing two types of labels for multi-label feature selection , 2019, Pattern Recognit..

[10]  Shunxiang Wu,et al.  Feature selection for multi-label learning based on kernelized fuzzy rough sets , 2018, Neurocomputing.

[11]  Rebecca S. Wills Google’s pagerank , 2006 .

[12]  Hossein Nezamabadi-pour,et al.  A Novel Three-Stage Filter-Wrapper Framework for miRNA Subset Selection in Cancer Classification , 2018, Informatics.

[13]  Hossein Nezamabadi-pour,et al.  A label-specific multi-label feature selection algorithm based on the Pareto dominance concept , 2019, Pattern Recognit..

[14]  Hossein Nezamabadi-pour,et al.  Ensemble of Filter-Based Rankers to Guide an Epsilon-Greedy Swarm Optimizer for High-Dimensional Feature Subset Selection , 2017, Inf..

[15]  Domingo Docampo,et al.  Measuring the academic reputation through citation networks via PageRank , 2018, J. Informetrics.

[16]  Charles Gouin-Vallerand,et al.  Unsupervised graph-based feature selection via subspace and pagerank centrality , 2018, Expert Syst. Appl..

[17]  Jie Tian,et al.  Robust graph regularized unsupervised feature selection , 2018, Expert Syst. Appl..

[18]  Xuelong Li,et al.  Feature selection with multi-view data: A survey , 2019, Inf. Fusion.

[19]  Hossein Nezamabadi-pour,et al.  Multilabel feature selection: A comprehensive review and guiding experiments , 2018, WIREs Data Mining Knowl. Discov..

[20]  Laishui Lv,et al.  PageRank centrality for temporal networks , 2019, Physics Letters A.

[21]  Newton Spolaôr,et al.  Lazy Multi-label Learning Algorithms Based on Mutuality Strategies , 2015, J. Intell. Robotic Syst..

[22]  Michel Verleysen,et al.  Semi-supervised relevance index for feature selection , 2019, Neural Computing and Applications.

[23]  Wenpu Xing,et al.  Weighted PageRank algorithm , 2004, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004..

[24]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[25]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[26]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[27]  Shungeng Min,et al.  A new hybrid filter/wrapper algorithm for feature selection in classification. , 2019, Analytica chimica acta.

[28]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[29]  Sebastián Ventura,et al.  Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context , 2015, Neurocomputing.

[30]  Jin Li,et al.  Using cooperative game theory to optimize the feature selection problem , 2012, Neurocomputing.

[31]  Jia Zhang,et al.  Mutual information based multi-label feature selection via constrained convex optimization , 2019, Neurocomputing.

[32]  Parham Moradi,et al.  A graph theoretic approach for unsupervised feature selection , 2015, Eng. Appl. Artif. Intell..

[33]  Bianca Zadrozny,et al.  Categorizing feature selection methods for multi-label classification , 2016, Artificial Intelligence Review.

[34]  Hossein Nezamabadi-pour,et al.  FCBF3Rules: A feature selection method for multi-label datasets , 2018, 2018 3rd Conference on Swarm Intelligence and Evolutionary Computation (CSIEC).

[35]  Vali Derhami,et al.  Winner Determination in Combinatorial Auctions using Hybrid Ant Colony Optimization and Multi-Neighborhood Local Search , 2017 .

[36]  Yu-Bin Yang,et al.  Discriminative embedded unsupervised feature selection , 2018, Pattern Recognit. Lett..

[37]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[38]  F. Agakov,et al.  Application of high-dimensional feature selection: evaluation for genomic prediction in man , 2015, Scientific Reports.

[39]  Mohammad Bagher Dowlatshahi,et al.  Using Gravitational Search Algorithm for Finding Near-optimal Base Station Location in Two-Tiered WSNs , 2012 .

[40]  Hossein Nezamabadi-pour,et al.  A discrete gravitational search algorithm for solving combinatorial optimization problems , 2014, Inf. Sci..

[41]  Franz Rothlauf,et al.  PageRank centrality for performance prediction: the impact of the local optima network model , 2017, Journal of Heuristics.

[42]  Yan Wang,et al.  Mutual information inspired feature selection using kernel canonical correlation analysis , 2019, Expert Syst. Appl. X.

[43]  Hossein Nezamabadi-pour,et al.  Gravitational Search Algorithm to Solve the K-of-N Lifetime Problem in Two-Tiered WSNs , 2015 .

[44]  Rui Huang,et al.  Manifold-based constraint Laplacian score for multi-label feature selection , 2018, Pattern Recognit. Lett..

[45]  Dae-Won Kim,et al.  Feature selection for multi-label classification using multivariate mutual information , 2013, Pattern Recognit. Lett..

[46]  Dae-Won Kim,et al.  Mutual Information-based multi-label feature selection using interaction information , 2015, Expert Syst. Appl..

[47]  Lu Zhang,et al.  A Feature Selection Method for Multi-Label Text Based on Feature Importance , 2019, Applied Sciences.