An Exceptional Reduction Algorithm for Outliers Analyzing in High-Dimension Space

Mining and analyzing for outliers is of great importance in many applications, including network invasion control, credit card and telecom fraud detection, etc. Existing outlier mining algorithms are focused on detecting outliers and lack valid approach for explaining and analyzing why they are exceptional. In order to describe exceptional features of high-dimension dataset in quantificational detail, the concepts of key attribute subspace of outliers and exceptional contribution degree of an attribute is defined in the paper. Furthermore, we present an idea of exceptional partition based on the theory of rough set. This leads to some efficient methods for outliers explaining and analyzing, in which an exceptional reduction algorithm (ERDA) that we proposed is mainly discussed in this paper. The ERDA offers a clever approach to identifying the origination of detected outliers and can help to improve one's understanding of whole data set. The results from a study on its complexity and experiments on real world data sets show that the proposed algorithm is scalable and efficient

[1]  Yan Song An enterprise crisis predicting system based on outlier data mining , 2005, Proceedings of ICSSSM '05. 2005 International Conference on Services Systems and Services Management, 2005..

[2]  Clara Pizzuti,et al.  Outlier mining in large high-dimensional data sets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Yun Zhang,et al.  An efficient reduction algorithm of high-dimensional decision tables based on rough sets theory , 2004, Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No.04EX788).

[4]  Stefan Berchtold,et al.  Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Jian Tang,et al.  Modeling and efficient mining of intentional knowledge of outliers , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[6]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[8]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[9]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[10]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .