An Efficient Grid-Based K-Prototypes Algorithm for Sustainable Decision-Making on Spatial Objects

Data mining plays a critical role in sustainable decision-making. Although the k-prototypes algorithm is one of the best-known algorithms for clustering both numeric and categorical data, clustering a large number of spatial objects with mixed numeric and categorical attributes is still inefficient due to complexity. In this paper, we propose an efficient grid-based k-prototypes algorithm, GK-prototypes, which achieves high performance for clustering spatial objects. The first proposed algorithm utilizes both maximum and minimum distance between cluster centers and a cell, which can reduce unnecessary distance calculation. The second proposed algorithm as an extension of the first proposed algorithm, utilizes spatial dependence; spatial data tends to be similar to objects that are close. Each cell has a bitmap index which stores the categorical values of all objects within the same cell for each attribute. This bitmap index can improve performance if the categorical data is skewed. Experimental results show that the proposed algorithms can achieve better performance than the existing pruning techniques of the k-prototypes algorithm.

[1]  Wenhao Yu,et al.  Spatial co-location pattern mining for location-based services in road networks , 2016, Expert Syst. Appl..

[2]  Salvatore Sessa,et al.  Spatio-temporal hotspots and application on a disease analysis case via GIS , 2014, Soft Comput..

[3]  Han Xi,et al.  Optimized scheme in coal-fired boiler combustion based on information entropy and modified K-prototypes algorithm , 2018 .

[4]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[5]  Jean-Claude Thill,et al.  Neurofuzzy Modeling of Context-Contingent Proximity Relations , 2007 .

[6]  Byoungwook Kim A Fast K-prototypes Algorithm Using Partial Distance Computation , 2017, Symmetry.

[7]  Jiaogen Zhou,et al.  KNFCOM-T: a k-nearest features-based co-location pattern mining algorithm for large spatial data sets by using T-trees , 2008, Int. J. Bus. Intell. Data Min..

[8]  Marion A. Hersh,et al.  Sustainable decision making: the role of decision support systems , 1999, IEEE Trans. Syst. Man Cybern. Part C.

[9]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[10]  Mohammad Hassan Moradi,et al.  Mortality prediction in intensive care units (ICUs) using a deep rule-based fuzzy classifier , 2018, J. Biomed. Informatics.

[11]  Kemal Polat,et al.  Application of Attribute Weighting Method Based on Clustering Centers to Discrimination of Linearly Non-Separable Medical Datasets , 2012, Journal of Medical Systems.

[12]  Xiao Xu,et al.  An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood , 2017, Knowl. Based Syst..

[13]  Zhe Wang,et al.  A novel cluster center initialization method for the k-prototypes algorithms using centrality and distance , 2015 .

[14]  Lisa Tompson,et al.  The Utility of Hotspot Mapping for Predicting Spatial Patterns of Crime , 2008 .

[15]  Manfred M. Fischer,et al.  Computational Neural Networks: A New Paradigm for Spatial Analysis , 1998 .

[16]  Jun Wei Liu,et al.  Mining Association Rules in Spatio‐Temporal Data: An Analysis of Urban Socioeconomic and Land Cover Change , 2005, Trans. GIS.

[17]  Salvatore Sessa,et al.  The extended fuzzy C-means algorithm for hotspots in spatio-temporal GIS , 2011, Expert Syst. Appl..

[18]  Katharina Morik,et al.  Introduction to data mining for sustainability , 2011, Data Mining and Knowledge Discovery.

[19]  Jurgita Antucheviciene,et al.  Sustainable Decision-Making in Civil Engineering, Construction and Building Technology , 2017 .

[20]  Ming-Chin Yang,et al.  Application of Data Mining on the Development of a Disease Distribution Map of Screened Community Residents of Taipei County in Taiwan , 2011, Journal of Medical Systems.

[21]  A. Rama Mohan Reddy,et al.  A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method , 2016, Pattern Recognit..

[22]  Kyusuk Chung,et al.  Introduction: Geographic Information Systems in Public Health and Medicine , 2004, Journal of Medical Systems.

[23]  Michael F. Goodchild,et al.  Geographical information science , 1992, Int. J. Geogr. Inf. Sci..

[24]  Lamjed Ben Said,et al.  A spatial data warehouse recommendation approach: conceptual framework and experimental evaluation , 2015, Human-centric Computing and Information Sciences.

[25]  Yu Xue,et al.  A novel density peaks clustering algorithm for mixed data , 2017, Pattern Recognit. Lett..

[26]  Chung-Chian Hsu,et al.  Mining of mixed data with application to catalog marketing , 2007, Expert Syst. Appl..

[27]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[28]  Chunguang Zhou,et al.  An improved k-prototypes clustering algorithm for mixed numeric and categorical data , 2013, Neurocomputing.

[29]  Jeong-Joon Kim Spatio-temporal Sensor Data Processing Techniques , 2017, J. Inf. Process. Syst..

[30]  Dong-Wan Choi,et al.  A K-partitioning algorithm for clustering large-scale spatio-textual data , 2017, Inf. Syst..