Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine

Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K-means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.

[1]  Peter Bak,et al.  Visual Analytics for Spatial Clustering: Using a Heuristic Approach for Guided Exploration , 2013, IEEE Transactions on Visualization and Computer Graphics.

[2]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[3]  Eamonn J. Keogh,et al.  Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy , 2015, KDD.

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[6]  Yang Wang,et al.  Revealing the fog-of-war: A visualization-directed, uncertainty-aware approach for exploring high-dimensional data , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[7]  Wang Guo-jun,et al.  A Hierarchical Clustering Method Based on the Threshold of Semantic Feature in Big Data , 2015 .

[8]  Xiaoli Z. Fern,et al.  Active Learning of Constraints for Semi-Supervised Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[9]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Yufei Tao,et al.  DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation , 2015, SIGMOD Conference.