Multi-source Manifold Outlier Detection

Outlier detection is an important task in data mining, with many practical applications ranging from fraud detection to public health. However, with the emergence of more and more multi-source data in many real-world scenarios, the task of outlier detection becomes even more challenging as traditional mono-source outlier detection techniques can no longer be suitable for multi-source heterogeneous data. In this paper, a general framework based the consistent representations is proposed to identify multi-source heterogeneous outlier. According to the information compatibility among different sources, Manifold learning are combined in the proposed method to obtain a shared representation space, in which the information-correlated representations are close along manifold while the semantic-complementary instances are close in Euclidean distance. Furthermore, the multi-source outliers can be effectively identified in the affine subspace which is learned through affine combination of shared representations from different sources in the feature-homogeneous space. Comprehensive empirical investigations are presented that confirm the promise of our proposed framework.