A tree-based incremental overlapping clustering method using the three-way decision theory

Existing clustering approaches are usually restricted to crisp clustering, where objects just belong to one cluster; meanwhile there are some applications where objects could belong to more than one cluster. In addition, existing clustering approaches usually analyze static datasets in which objects are kept unchanged after being processed; however many practical datasets are dynamically modified which means some previously learned patterns have to be updated accordingly. In this paper, we propose a new tree-based incremental overlapping clustering method using the three-way decision theory. The tree is constructed from representative points introduced by this paper, which can enhance the relevance of the search result. The overlapping cluster is represented by the three-way decision with interval sets, and the three-way decision strategies are designed to updating the clustering when the data increases. Furthermore, the proposed method can determine the number of clusters during the processing. The experimental results show that it can identifies clusters of arbitrary shapes and does not sacrifice the computing time, and more results of comparison experiments show that the performance of proposed method is better than the compared algorithms in most of cases.

[1]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[2]  Yiyu Yao,et al.  An Outline of a Theory of Three-Way Decisions , 2012, RSCTC.

[3]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[4]  Ville Ollikainen,et al.  Distance based Incremental Clustering for Mining Clusters of Arbitrary Shapes , 2013, PReMI.

[5]  Navneet Goyal,et al.  An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment , 2011 .

[6]  Reynaldo Gil-García,et al.  Dynamic hierarchical algorithms for document clustering , 2010, Pattern Recognit. Lett..

[7]  Yihong Gong,et al.  Incremental spectral clustering by efficiently updating the eigen-system , 2010, Pattern Recognit..

[8]  Saso Dzeroski,et al.  Online tree-based ensembles and option trees for regression on evolving data streams , 2015, Neurocomputing.

[9]  Richard Weber,et al.  Dynamic rough clustering and its applications , 2012, Appl. Soft Comput..

[10]  José Francisco Martínez Trinidad,et al.  A New Overlapping Clustering Algorithm Based on Graph Theory , 2012, MICAI.

[11]  Gisung Kim,et al.  Self-adaptive and dynamic clustering for online anomaly detection , 2011, Expert Syst. Appl..

[12]  Xin Zhang,et al.  IncOrder: Incremental density-based community detection in dynamic networks , 2014, Knowl. Based Syst..

[13]  Mohamed S. Kamel,et al.  Efficient phrase-based document indexing for Web document clustering , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Min Chen,et al.  Interval set clustering , 2011, Expert Syst. Appl..

[15]  Ameer Ahmed Abbasi,et al.  A survey on clustering algorithms for wireless sensor networks , 2007, Comput. Commun..

[16]  Guoyin Wang,et al.  An automatic method to determine the number of clusters using decision-theoretic rough set , 2014, Int. J. Approx. Reason..

[17]  Decui Liang,et al.  A Novel Risk Decision Making Based on Decision-Theoretic Rough Sets Under Hesitant Fuzzy Information , 2015, IEEE Transactions on Fuzzy Systems.

[18]  Pawan Lingras,et al.  Interval Set Clustering of Web Users with Rough K-Means , 2004, Journal of Intelligent Information Systems.

[19]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[20]  Tianrui Li,et al.  Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values , 2015, Inf. Sci..

[21]  Chu-Sing Yang,et al.  A fast tree-based search algorithm for cluster search engine , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[22]  Nouman Azam,et al.  Analyzing uncertainties of probabilistic rough set regions with game-theoretic rough sets , 2014, Int. J. Approx. Reason..

[23]  Guoyin Wang,et al.  A Rough Set-Based Method for Updating Decision Rules on Attribute Values’ Coarsening and Refining , 2014, IEEE Transactions on Knowledge and Data Engineering.

[24]  Ying Wang,et al.  A Three-Way Decisions Approach to Density-Based Overlapping Clustering , 2014, Trans. Rough Sets.

[25]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[26]  Ruggero G. Pensa,et al.  Hierarchical co-clustering: off-line and incremental approaches , 2012, Data Mining and Knowledge Discovery.

[27]  Yiyu Yao,et al.  Interval Set Cluster Analysis: A Re-formulation , 2009, RSFDGrC.

[28]  P. Lingras,et al.  Interval clustering using fuzzy and rough set theory , 2004, IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS '04..

[29]  Hongmei Chen,et al.  Dynamic maintenance of approximations in set-valued ordered decision systems under the attribute generalization , 2014, Inf. Sci..

[30]  Jennifer Blackhurst,et al.  MMR: An algorithm for clustering categorical data using Rough Set Theory , 2007, Data Knowl. Eng..

[31]  Mohamed A. Ismail,et al.  Incremental Mitosis: Discovering Clusters of Arbitrary Shapes and Densities in Dynamic Data , 2012, 2012 11th International Conference on Machine Learning and Applications.

[32]  Nicolas Labroche,et al.  Online fuzzy medoid based clustering algorithms , 2014, Neurocomputing.

[33]  Guoyin Wang,et al.  A Decision-Theoretic Rough Set Approach for Dynamic Data Mining , 2015, IEEE Transactions on Fuzzy Systems.

[34]  Hong Yu,et al.  An Incremental Clustering Approach Based on Three-Way Decisions , 2014, RSCTC.

[35]  Yiyu Yao,et al.  Cost-sensitive three-way email spam filtering , 2013, Journal of Intelligent Information Systems.

[36]  Yiyu Yao,et al.  The superiority of three-way decisions in probabilistic rough set models , 2011, Inf. Sci..

[37]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[38]  Edwin Lughofer A dynamic split-and-merge approach for evolving cluster models , 2012, Evol. Syst..

[39]  Richard Weber,et al.  Soft clustering - Fuzzy and rough approaches and their extensions and derivatives , 2013, Int. J. Approx. Reason..

[40]  José Francisco Martínez Trinidad,et al.  An algorithm based on density and compactness for dynamic overlapping clustering , 2013, Pattern Recognit..

[41]  Decui Liang,et al.  Systematic studies on three-way decisions with interval-valued decision-theoretic rough sets , 2014, Inf. Sci..