Outlier Detection and Semi-Supervised Clustering Algorithm Based on Shared Nearest Neighbors

Traditional clustering analysis is unsupervised.Its precision is affected by similarity measures and outlier in the dataset and the algorithm don't take advantage of prior knowledge which can reflect the demands of users,therefore this article proposes the outlier detection and semi-supervised clustering algorithm which based on shared nearest neighbors.The algorithm according to the number of the nearest neighbors of the data in the dataset to detect the outliers in data dataset,then deal with the dataset which be operated by detecting the outliers by using semi-clustering.And during the clustering process,it adds some prior knowledge which was expanded and cluster the dataset based on the principle of graph segmentation.And the article uses some UCI datasets to make simulation experiments.The results show that the algorithm can detect the outliers effectively,and have good performance of the clustering effect.