Fast feature selection for interval-valued data through kernel density estimation entropy

Kernel density estimation, which is a non-parametric method about estimating probability density distribution of random variables, has been used in feature selection. However, existing feature selection methods based on kernel density estimation seldom consider interval-valued data. Actually, interval-valued data exist widely. In this paper, a feature selection method based on kernel density estimation for interval-valued data is proposed. Firstly, the kernel function in kernel density estimation is defined for interval-valued data. Secondly, the interval-valued kernel density estimation probability structure is constructed by the defined kernel function, including kernel density estimation conditional probability, kernel density estimation joint probability and kernel density estimation posterior probability. Thirdly, kernel density estimation entropies for interval-valued data are proposed by the constructed probability structure, including information entropy, conditional entropy and joint entropy of kernel density estimation. Fourthly, we propose a feature selection approach based on kernel density estimation entropy. Moreover, we improve the proposed feature selection algorithm and propose a fast feature selection algorithm based on kernel density estimation entropy. Finally, comparative experiments are conducted from three perspectives of computing time, intuitive identifiability and classification performance to show the feasibility and the effectiveness of the proposed method.

[1]  Lifeng Li,et al.  Multi-level interval-valued fuzzy concept lattices and their attribute reduction , 2017, Int. J. Mach. Learn. Cybern..

[2]  Jianhua Dai,et al.  Uncertainty measurement for interval-valued decision systems based on extended conditional entropy , 2012, Knowl. Based Syst..

[3]  Lin Yang,et al.  An integrative hierarchical stepwise sampling strategy for spatial sampling and its application in digital soil mapping , 2011, Int. J. Geogr. Inf. Sci..

[4]  Xiaowei Li,et al.  A survey on server-side approaches to securing web applications , 2014, ACM Comput. Surv..

[5]  Rui Zhang,et al.  A robust parzen window mutual information estimator for feature selection with label noise , 2015, Intell. Data Anal..

[6]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[7]  Qinghua Hu,et al.  Probability approach for interval-valued ordered decision systems in dominance-based fuzzy rough set theory , 2017, J. Intell. Fuzzy Syst..

[8]  Mohammad Masoud Javidi,et al.  Streamwise feature selection: a rough set method , 2018, Int. J. Mach. Learn. Cybern..

[9]  Ran Wang,et al.  Discovering the Relationship Between Generalization and Uncertainty by Incorporating Complexity of Classification , 2018, IEEE Transactions on Cybernetics.

[10]  Rui Gao,et al.  Bayesian network classifiers based on Gaussian kernel density , 2016, Expert Syst. Appl..

[11]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[12]  Sridhar Krishnan,et al.  Effective Dysphonia Detection Using Feature Dimension Reduction and Kernel Density Estimation for Patients with Parkinson’s Disease , 2014, PloS one.

[13]  Yanhui Guo,et al.  A hybrid dermoscopy images segmentation approach based on neutrosophic clustering and histogram estimation , 2018, Appl. Soft Comput..

[14]  Qinghua Hu,et al.  Attribute Selection for Partially Labeled Categorical Data By Rough Set Approach , 2017, IEEE Transactions on Cybernetics.

[15]  Jianhua Dai,et al.  Dominance-based fuzzy rough set approach for incomplete interval-valued data , 2018, J. Intell. Fuzzy Syst..

[16]  Tinghua Ai,et al.  The analysis and delimitation of Central Business District using network kernel density estimation , 2015 .

[17]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Qinghua Hu,et al.  Discrete particle swarm optimization approach for cost sensitive attribute reduction , 2016, Knowl. Based Syst..

[19]  Tse-Yun Feng Guest Editorial: An Overview of Parallel Processors and Processing , 1977, CSUR.

[20]  Zhaohong Deng,et al.  Fast Adaptive Similarity-based Clustering Using Sparse Parzen Window Density Estimation: Fast Adaptive Similarity-based Clustering Using Sparse Parzen Window Density Estimation , 2011 .

[21]  Qinghua Hu,et al.  Neighbor Inconsistent Pair Selection for Attribute Reduction by Rough Set Approach , 2018, IEEE Transactions on Fuzzy Systems.

[22]  D. S. Guru,et al.  Interval Chi-Square Score (ICSS): Feature Selection of Interval Valued Data , 2018, ISDA.

[23]  Yong Qi,et al.  α-Dominance relation and rough sets in interval-valued information systems , 2015, Inf. Sci..

[24]  Jianhua Dai,et al.  Uncertainty measurement for incomplete interval-valued information systems based on α-weak similarity , 2017, Knowl. Based Syst..

[25]  D. S. Guru,et al.  Feature Selection of Interval Valued Data Through Interval K-Means Clustering , 2017, Int. J. Comput. Vis. Image Process..

[26]  Joseph Aguilar-Martin,et al.  Similarity-margin based feature selection for symbolic interval data , 2011, Pattern Recognit. Lett..

[27]  Xiao Zhang,et al.  Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy , 2016, Pattern Recognit..

[28]  Witold Pedrycz,et al.  A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning , 2015, IEEE Transactions on Fuzzy Systems.

[29]  Predrag S. Stanimirovic,et al.  Gauss-Jordan elimination method for computing outer inverses , 2013, Appl. Math. Comput..

[30]  Joseph F. Grcar,et al.  Mathematicians of Gaussian Elimination , 2011 .

[31]  Jianhua Dai,et al.  Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification , 2013, Appl. Soft Comput..

[32]  Jianhua Dai,et al.  Uncertainty measurement for interval-valued information systems , 2013, Inf. Sci..

[33]  Zhifeng Wu,et al.  Using kernel density estimation to assess the spatial pattern of road density and its impact on landscape fragmentation , 2013, Int. J. Geogr. Inf. Sci..

[34]  Wei-Zhi Wu,et al.  Maximal-Discernibility-Pair-Based Approach to Attribute Reduction in Fuzzy Rough Sets , 2018, IEEE Transactions on Fuzzy Systems.

[35]  ChenDegang,et al.  Feature selection in mixed data , 2016 .

[36]  Jianhua Dai,et al.  Rough set approach to incomplete numerical data , 2013, Inf. Sci..

[37]  Qinghua Hu,et al.  Attribute reduction in interval-valued information systems based on information entropies , 2016, Frontiers of Information Technology & Electronic Engineering.

[38]  Uwe Hartmann,et al.  Mapping neural network derived from the parzen window estimator , 1992, Neural Networks.

[39]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[40]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[41]  Sam Kwong,et al.  Incorporating Diversity and Informativeness in Multiple-Instance Active Learning , 2017, IEEE Transactions on Fuzzy Systems.

[42]  Amit Banerjee,et al.  Efficient Particle Filtering via Sparse Kernel Density Estimation , 2010, IEEE Transactions on Image Processing.

[43]  Korris Fu-Lai Chung,et al.  A novel image thresholding method based on Parzen window estimate , 2008, Pattern Recognit..

[44]  Bao Qing Hu,et al.  Approximate distribution reducts in inconsistent interval-valued ordered decision tables , 2014, Inf. Sci..

[45]  Tianrui Li,et al.  Incremental updating of rough approximations in interval-valued information systems under attribute generalization , 2016, Inf. Sci..

[46]  Jinhai Li,et al.  Neighborhood attribute reduction: a multi-criterion approach , 2019, Int. J. Mach. Learn. Cybern..

[47]  Vicenç Puig,et al.  Validation and reconstruction of flow meter data in the Barcelona water distribution network , 2010 .

[48]  Amin Kargarian,et al.  Parzen Window Density Estimator-Based Probabilistic Power Flow With Correlated Uncertainties , 2016, IEEE Transactions on Sustainable Energy.

[49]  Yu Xue,et al.  Unsupervised feature selection based on self-representation sparse regression and local similarity preserving , 2017, International Journal of Machine Learning and Cybernetics.

[50]  Qinghua Hu,et al.  A Fitting Model for Feature Selection With Fuzzy Rough Sets , 2017, IEEE Transactions on Fuzzy Systems.