Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems

Abstract Feature selection for mixed and incomplete data in terms of numerical and categorical features with missing values has currently gained considerable attention. The development of the neighborhood rough sets-based feature selection method is an important step in improving classification performance, especially in incomplete data with mixed continuous numerical and categorical features. In this paper, a novel feature selection method based on the neighborhood rough sets using Lebesgue and entropy measures in incomplete neighborhood decision systems is proposed, and the method has the capacity to handle mixed and incomplete datasets; further, it can simultaneously maintain the original classification information. First, a Lebesgue measure based on the neighborhood tolerance class is developed to study the positive region and dependency degree. To thoroughly analyze the uncertainty, noise and incompleteness of incomplete neighborhood decision systems, some neighborhood tolerance entropy-based uncertainty measures are presented based on Lebesgue and entropy measures. Then, by combining an algebraic view with an information view in neighborhood rough sets, the neighborhood tolerance dependency joint entropy is defined in incomplete neighborhood decision systems. Moreover, all the corresponding properties are discussed, and the relationships among these measures are established to meaningfully convey the knowledge essence and investigate the uncertainty of incomplete neighborhood decision systems. Finally, for all high-dimensional datasets, the Fisher score method is used to preliminarily eliminate irrelevant features to significantly reduce the computational complexity, and a heuristic feature selection algorithm is designed to improve the classification performance of mixed and incomplete datasets. Experiments under an instance and fifteen public datasets demonstrate that the proposed feature selection method is effective in selecting the most relevant features, achieving great classification ability for incomplete neighborhood decision systems.

[1]  Xiaojun Xie,et al.  A novel incremental attribute reduction approach for dynamic incomplete decision systems , 2018, Int. J. Approx. Reason..

[2]  Lin Sun,et al.  An Image Segmentation Method Based on Improved Regularized Level Set Model , 2018, Applied Sciences.

[3]  Lin Sun,et al.  A Neighborhood Rough Sets-Based Attribute Reduction Method Using Lebesgue and Entropy Measures , 2019, Entropy.

[4]  Ming-Wen Shao,et al.  Attribute reduction based on k-nearest neighborhood rough sets , 2019, Int. J. Approx. Reason..

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  J. K. Hunter,et al.  Measure Theory , 2007 .

[7]  Wei-Zhi Wu,et al.  Intuitionistic Fuzzy Rough Set-Based Granular Structures and Attribute Subset Selection , 2019, IEEE Transactions on Fuzzy Systems.

[8]  Yu Xue,et al.  Gene selection for tumor classification using neighborhood rough sets and entropy measures , 2017, J. Biomed. Informatics.

[9]  Yang Huang,et al.  Attribute reduction based on max-decision neighborhood rough set model , 2018, Knowl. Based Syst..

[10]  Lin Sun,et al.  An Attribute Reduction Method Using Neighborhood Entropy Measures in Neighborhood Rough Sets , 2019, Entropy.

[11]  Jiucheng Xu,et al.  Improved LLE and neighborhood rough sets-based gene selection using Lebesgue measure for cancer classification on gene expression data , 2019, J. Intell. Fuzzy Syst..

[12]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[13]  Lin Sun,et al.  A robust image watermarking scheme using Arnold transform and BP neural network , 2017, Neural Computing and Applications.

[14]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[15]  Qiang Shen,et al.  New Approaches to Fuzzy-Rough Feature Selection , 2009, IEEE Transactions on Fuzzy Systems.

[16]  Wenhao Shu,et al.  Attribute reduction in incomplete ordered information systems with fuzzy decision , 2018, Appl. Soft Comput..

[17]  Kun She,et al.  A Universal neighbourhood rough sets model for knowledge discovering from incomplete heterogeneous data , 2013, Expert Syst. J. Knowl. Eng..

[18]  Kezhu Tan,et al.  Neighborhood mutual information and its application on hyperspectral band selection for classification , 2016 .

[19]  Jaya Sil,et al.  Gene selection for designing optimal fuzzy rule base classifier by estimating missing value , 2017, Appl. Soft Comput..

[20]  Lin Sun,et al.  Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification , 2019, Inf. Sci..

[21]  Siddhartha Bhattacharyya,et al.  A group incremental feature selection for classification using rough set theory based genetic algorithm , 2018, Appl. Soft Comput..

[22]  Can Gao,et al.  Maximum decision entropy-based attribute reduction in decision-theoretic rough set model , 2017, Knowl. Based Syst..

[23]  Tao Li,et al.  A novel hybrid genetic algorithm with granular information for feature selection and optimization , 2018, Appl. Soft Comput..

[24]  Xuhui Chen,et al.  An entropy-based uncertainty measurement approach in neighborhood systems , 2014, Inf. Sci..

[25]  Bin Qin,et al.  Knowledge structures in a tolerance knowledge base and their uncertainty measures , 2018, Knowl. Based Syst..

[26]  R. Devi Priya,et al.  Dynamic Genetic Algorithm-Based Feature Selection and Incomplete Value Imputation for Microarray Classification , 2017 .

[27]  Lin Sun,et al.  Improved Monarch Butterfly Optimization Algorithm Based on Opposition-Based Learning and Random Local Perturbation , 2019, Complex..

[28]  Jianhua Dai,et al.  Rough set approach to incomplete numerical data , 2013, Inf. Sci..

[29]  Jiye Liang,et al.  Fuzzy-rough feature selection accelerator , 2015, Fuzzy Sets Syst..

[30]  Zhongzhi Shi,et al.  A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets , 2009, Inf. Sci..

[31]  Wenhao Shu,et al.  Mutual information criterion for feature selection from incomplete data , 2015, Neurocomputing.

[32]  Guoyin Wang,et al.  Rough reduction in algebra view and information view , 2003, Int. J. Intell. Syst..

[33]  Yumin Chen,et al.  Neighborhood rough set reduction with fish swarm algorithm , 2017, Soft Comput..

[34]  Jun Zhang,et al.  Efficient attribute reduction from the viewpoint of discernibility , 2016, Inf. Sci..

[35]  Chuanjian Yang,et al.  Positive Region Reduct Based on Relative Discernibility and Acceleration Strategy , 2018, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[36]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[37]  Gary Geunbae Lee,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006, Inf. Process. Manag..

[38]  Qinghua Hu,et al.  A Fitting Model for Feature Selection With Fuzzy Rough Sets , 2017, IEEE Transactions on Fuzzy Systems.

[39]  N. Gopika,et al.  Correlation Based Feature Selection Algorithm for Machine Learning , 2018, 2018 3rd International Conference on Communication and Electronics Systems (ICCES).

[40]  Lin Sun,et al.  An Affinity Propagation Clustering Method Using Hybrid Kernel Function With LLE , 2018, IEEE Access.

[41]  Zhenzhou Lu,et al.  A kernel estimate method for characteristic function-based uncertainty importance measure , 2017 .

[42]  Lin Sun,et al.  A Gene selection approach based on the fisher linear discriminant and the neighborhood rough set , 2017, Bioengineered.

[43]  Yenny Villuendas-Rey,et al.  Maximal similarity granular rough sets for mixed and incomplete information systems , 2018, Soft Computing.

[44]  Wlodzislaw Duch,et al.  Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter , 2005, CORES.

[45]  Degang Chen,et al.  Fuzzy rough set based attribute reduction for information systems with fuzzy decisions , 2011, Knowl. Based Syst..

[46]  Ming-Wen Shao,et al.  Dominance relation and rules in an incomplete ordered information system , 2005 .

[47]  Ming-Wen Shao,et al.  Uncertainty measures for general fuzzy relations , 2019, Fuzzy Sets Syst..

[48]  Qinghua Hu,et al.  Feature Selection Based on Neighborhood Discrimination Index , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Lin Sun,et al.  Joint neighborhood entropy-based gene selection method with fisher score for tumor classification , 2018, Applied Intelligence.

[50]  Lin Sun,et al.  Feature selection using rough entropy-based uncertainty measures in incomplete decision systems , 2012, Knowl. Based Syst..

[51]  Witold Pedrycz,et al.  An efficient accelerator for attribute reduction from incomplete data in rough set framework , 2011, Pattern Recognit..

[52]  Xinye Cai,et al.  Neighborhood based decision-theoretic rough set models , 2016, Int. J. Approx. Reason..

[53]  Lin Sun,et al.  A Hybrid Gene Selection Method Based on ReliefF and Ant Colony Optimization Algorithm for Tumor Classification , 2019, Scientific Reports.

[54]  Lin Sun,et al.  Information Entropy and Mutual Information-based Uncertainty Measures in Rough Set Theory , 2014 .

[55]  Qinghua Hu,et al.  A Novel Algorithm for Finding Reducts With Fuzzy Rough Sets , 2012, IEEE Transactions on Fuzzy Systems.

[56]  Huan Liu,et al.  Searching for interacting features in subset selection , 2009, Intell. Data Anal..

[57]  Hua Zhao,et al.  Mixed feature selection in incomplete decision table , 2014, Knowl. Based Syst..

[58]  Witold Pedrycz,et al.  Measuring relevance between discrete and continuous features based on neighborhood mutual information , 2011, Expert Syst. Appl..

[59]  Jiucheng Xu,et al.  An Adaptive Density Peaks Clustering Method With Fisher Linear Discriminant , 2019, IEEE Access.

[60]  Liang Liu,et al.  Attribute selection based on a new conditional entropy for incomplete decision systems , 2013, Knowl. Based Syst..