A clustering algorithm based on the weighted entropy of conditional attributes for mixed data

A novel definition for weighted entropy is proposed to improve clustering performance for small and diverse datasets. First, intra‐class and inter‐class weighted entropies for categorical and numeric conditional attributes are respectively developed using the mathematical definition of entropy. Second, the weighted entropy is used to calculate cluster weights for mixed conditional attributes. A unique weighted clustering algorithm that adopts entropy as its primary description term, after integrating the corresponding distance calculation mechanism, is then introduced. Finally, a theoretical analysis and validation experiment were conducted using the UC‐Irvine dataset. Results showed that the proposed algorithm offers high self‐adaptability, as its clustering performance was superior to the existing K‐prototypes, SBAC, and OCIL algorithms.

[1]  Duoqian Miao,et al.  Entropy-based multi-view matrix completion for clustering with side information , 2019, Pattern Analysis and Applications.

[2]  Jinjun Chen,et al.  Differential Privacy Techniques for Cyber Physical Systems: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[3]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[4]  Penghong Wang,et al.  A Gaussian error correction multi‐objective positioning model with NSGA‐II , 2019, Concurr. Comput. Pract. Exp..

[5]  Zhenlong Li,et al.  Big Data and cloud computing: innovation opportunities and challenges , 2017, Int. J. Digit. Earth.

[6]  Xiao Xu,et al.  An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood , 2017, Knowl. Based Syst..

[7]  Jinjun Chen,et al.  Privacy preservation in blockchain based IoT systems: Integration issues, prospects, challenges, and future research directions , 2019, Future Gener. Comput. Syst..

[8]  Olle Findahl,et al.  The Effect of Visual Illustrations upon Perception and Retention of News Programmes , 1981 .

[9]  Chan-Gook Park,et al.  Cardinality compensation method based on information-weighted consensus filter using data clustering for multi-target tracking , 2019 .

[10]  Zhigang Lu,et al.  FEW-NNN: A fuzzy entropy weighted natural nearest neighbor method for flow-based network traffic attack detection , 2020, China Communications.

[11]  Jinjun Chen,et al.  DEAL: Differentially Private Auction for Blockchain-Based Microgrids Energy Trading , 2020, IEEE Transactions on Services Computing.

[12]  Zhang Yuxian,et al.  Self-organizing mapping clustering algorithm based on heterogeneous value difference metric for mixed attribute data , 2016 .

[13]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[14]  LI Xiang-l Clustering boundary detection technology for mixed attribute data set , 2015 .

[15]  Jinjun Chen,et al.  A Multicloud-Model-Based Many-Objective Intelligent Algorithm for Efficient Task Scheduling in Internet of Things , 2021, IEEE Internet of Things Journal.

[16]  Youpeng Xu,et al.  Effects of industry structures on water quality in different urbanized regions using an improved entropy-weighted matter-elementmethodology , 2019, Environmental Science and Pollution Research.

[17]  Hai Jin,et al.  A throughput maximization strategy for scheduling transaction‐intensive workflows on SwinDeW‐G , 2008, Concurr. Comput. Pract. Exp..

[18]  Zhihua Cui,et al.  Personalized Recommendation System Based on Collaborative Filtering for IoT Scenarios , 2020, IEEE Transactions on Services Computing.

[19]  Ahmed M. Khedr,et al.  An information entropy based-clustering algorithm for heterogeneous wireless sensor networks , 2018, Wirel. Networks.

[20]  Zhihua Cui,et al.  A Hybrid BlockChain-Based Identity Authentication Scheme for Multi-WSN , 2020, IEEE Transactions on Services Computing.

[21]  Pierpaolo D'Urso,et al.  Fuzzy clustering of fuzzy data based on robust loss functions and ordered weighted averaging , 2020, Fuzzy Sets Syst..

[22]  Zhihua Cui,et al.  An under‐sampled software defect prediction method based on hybrid multi‐objective cuckoo search , 2019, Concurr. Comput. Pract. Exp..

[23]  Hong Jia,et al.  Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number , 2013, Pattern Recognit..

[24]  Penghong Wang,et al.  A Multi-Objective DV-Hop Localization Algorithm Based on NSGA-II in Internet of Things , 2019, Mathematics.

[25]  Jinjun Chen,et al.  A Dynamic Key Length Based Approach for Real-Time Security Verification of Big Sensing Data Stream , 2015, WISE.

[26]  Miin-Shen Yang,et al.  Feature-Weighted Possibilistic c-Means Clustering With a Feature-Reduction Framework , 2021, IEEE Transactions on Fuzzy Systems.