Outlier detection has been extensively researched in the context of unsupervised learning. But the learning results are not always satisfactory, which can be significantly improved using supervision of some labeled points. In this paper, we are concerned with employing supervision of limited amount of label information to detect outliers more accurately. The key of our approach is an objective function that punishes poor clustering results and deviation from known labels as well as restricts the number of outliers. The outliers can be found as a solution to the discrete optimization problem regarding the objective function. By this way, this method can detect meaningful outliers that can not be identified by existing unsupervised methods.
[1]
J. MacQueen.
Some methods for classification and analysis of multivariate observations
,
1967
.
[2]
Raymond T. Ng,et al.
Distance-based outliers: algorithms and applications
,
2000,
The VLDB Journal.
[3]
Sebastian Thrun,et al.
Text Classification from Labeled and Unlabeled Documents using EM
,
2000,
Machine Learning.
[4]
Raymond J. Mooney,et al.
A probabilistic framework for semi-supervised clustering
,
2004,
KDD.
[5]
Hans-Peter Kriegel,et al.
LOF: identifying density-based local outliers
,
2000,
SIGMOD '00.