Unsupervised attribute reduction for mixed data based on fuzzy rough sets

Abstract Unsupervised attribute reduction becomes very challenging due to a lack of decision information, which is to select a subset of attributes that can maintain learning ability without decision information. However, most of the existing unsupervised attribute reduction methods are proposed for numerical or nominal attributes, and little research has been done on unsupervised mixed attribute reduction methods. In view of this, this paper proposes a generalized unsupervised mixed attribute reduction model based on fuzzy rough sets. First, based on all single attribute subsets, the significance is defined to indicate the importance of a candidate attribute. Then, a specific fuzzy rough-based unsupervised attribute reduction (FRUAR) algorithm is designed. Finally, the proposed algorithm is compared with the existing algorithms by using thirty public data sets. Experimental results show that the algorithm FRUAR can select fewer attributes to maintain or improve the performance of learning algorithms, and it is suitable for mixed attribute data.

[1]  Huan Liu,et al.  Embedded Unsupervised Feature Selection , 2015, AAAI.

[2]  Tianrui Li,et al.  Fuzzy information entropy-based adaptive approach for hybrid feature outlier detection , 2020, Fuzzy Sets Syst..

[3]  Witold Pedrycz,et al.  Kernelized Fuzzy Rough Sets and Their Applications , 2011, IEEE Transactions on Knowledge and Data Engineering.

[4]  Xia Xiao,et al.  Three-way group decision making based on multigranulation fuzzy decision-theoretic rough set over two universes , 2017, Int. J. Approx. Reason..

[5]  Qinghua Hu,et al.  Streaming Feature Selection for Multilabel Learning Based on Fuzzy Mutual Information , 2017, IEEE Transactions on Fuzzy Systems.

[6]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[7]  Qinghua Hu,et al.  Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation , 2007, Pattern Recognit..

[8]  Xizhao Wang,et al.  Uncertainty learning of rough set-based prediction under a holistic framework , 2018, Inf. Sci..

[9]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[10]  Yan Wang,et al.  Fuzzy Rough Attribute Reduction for Categorical Data , 2020, IEEE Transactions on Fuzzy Systems.

[11]  Xizhao Wang,et al.  On the generalization of fuzzy rough sets , 2005, IEEE Transactions on Fuzzy Systems.

[12]  Giancarlo Fortino,et al.  Intelligent temporal classification and fuzzy rough set-based feature selection algorithm for intrusion detection system in WSNs , 2019, Inf. Sci..

[13]  Duoqian Miao,et al.  Class-specific information measures and attribute reducts for hierarchy and systematicness , 2021, Inf. Sci..

[14]  Ji Zhang,et al.  A factor graph model for unsupervised feature selection , 2019, Inf. Sci..

[15]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[16]  Tianrui Li,et al.  Attribute reduction methods in fuzzy rough set theory: An overview, comparative experiments, and new directions , 2021, Appl. Soft Comput..

[17]  Hong Chen,et al.  Incremental feature selection based on fuzzy rough sets , 2020, Inf. Sci..

[18]  Xizhao Wang,et al.  Learning fuzzy rules from fuzzy samples based on rough set technique , 2007, Inf. Sci..

[19]  Sankar K. Pal,et al.  Fuzzy rough sets, and a granular neural network for unsupervised feature selection , 2013, Neural Networks.

[20]  Tianrui Li,et al.  Multi-source information fusion based on rough set theory: A review , 2021, Inf. Fusion.

[21]  Wei-Zhi Wu,et al.  Maximal-Discernibility-Pair-Based Approach to Attribute Reduction in Fuzzy Rough Sets , 2018, IEEE Transactions on Fuzzy Systems.

[22]  Rajen B. Bhatt,et al.  FRCT: fuzzy-rough classification trees , 2007, Pattern Analysis and Applications.

[23]  Qinghua Hu,et al.  Information-preserving hybrid data reduction based on fuzzy-rough techniques , 2006, Pattern Recognit. Lett..

[24]  Qinghua Hu,et al.  A Novel Algorithm for Finding Reducts With Fuzzy Rough Sets , 2012, IEEE Transactions on Fuzzy Systems.

[25]  Dong Lianjie,et al.  Key energy-consumption feature selection of thermal power systems based on robust attribute reduction with rough sets , 2020, Inf. Sci..

[26]  Qinghua Hu,et al.  Co-regularized unsupervised feature selection , 2018, Neurocomputing.

[27]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[28]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[29]  Yee Leung,et al.  Generalized fuzzy rough sets determined by a triangular norm , 2008, Inf. Sci..

[30]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[31]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[32]  K. Thangavel,et al.  A novel entropy based unsupervised Feature Selection algorithm using rough set theory , 2012, IEEE-International Conference On Advances In Engineering, Science And Management (ICAESM -2012).

[33]  Zehong Cao,et al.  Attribute reduction with fuzzy rough self-information measures , 2021, Inf. Sci..

[34]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Ming-Wen Shao,et al.  Fuzzy rough set-based attribute reduction using distance measures , 2019, Knowl. Based Syst..

[36]  Qinghua Hu,et al.  Subspace clustering guided unsupervised feature selection , 2017, Pattern Recognit..

[37]  Xiao Zhang,et al.  Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy , 2016, Pattern Recognit..

[38]  Zhong Yuan,et al.  Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures , 2018, Expert Syst. Appl..

[39]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[40]  D. Dubois,et al.  ROUGH FUZZY SETS AND FUZZY ROUGH SETS , 1990 .

[41]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[43]  Qiang Shen,et al.  Centre for Intelligent Systems and Their Applications Fuzzy Rough Attribute Reduction with Application to Web Categorization Fuzzy Rough Attribute Reduction with Application to Web Categorization Fuzzy Sets and Systems ( ) – Fuzzy–rough Attribute Reduction with Application to Web Categorization , 2022 .

[44]  Richard Jensen,et al.  Unsupervised fuzzy-rough set-based dimensionality reduction , 2013, Inf. Sci..