Collusion Set Detection Through Outlier Discovery

The ability to identify collusive malicious behavior is critical in today's security environment. We pose the general problem of Collusion Set Detection (CSD): identifying sets of behavior that together satisfy some notion of “interesting behavior”. For this paper, we focus on a subset of the problem (called CSD′), by restricting our attention only to outliers. In the process of proposing the solution, we make the following novel research contributions: First, we propose a suitable distance metric, called the collusion distance metric, and formally prove that it indeed is a distance metric. We propose a collusion distance based outlier detection (CDB) algorithm that is capable of identifying the causal dimensions (n) responsible for the outlierness, and demonstrate that it improves both precision and recall, when compared to the Euclidean based outlier detection. Second, we propose a solution to the CSD′ problem, which relies on the semantic relationships among the causal dimensions.

[1]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[2]  Gang Wang,et al.  Automatically detecting deceptive criminal identities , 2004, CACM.

[3]  Andrew W. Moore,et al.  Finding Underlying Connections: A Fast Graph-Based Method for Link Analysis and Collaboration Queries , 2003, ICML.

[4]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[5]  Hsinchun Chen,et al.  Untangling Criminal Networks: A Case Study , 2003, ISI.

[6]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[7]  Günter Rote,et al.  Computing the Minimum Hausdorff Distance Between Two Point Sets on a Line Under Translation , 1991, Inf. Process. Lett..

[8]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[11]  John Scott What is social network analysis , 2010 .

[12]  Hans-Peter Kriegel,et al.  OPTICS-OF: Identifying Local Outliers , 1999, PKDD.

[13]  Asunción Gómez-Pérez,et al.  Building a chemical ontology using Methontology and the Ontology Design Environment , 1999, IEEE Intell. Syst..

[14]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[15]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[16]  Zengyou He,et al.  Outlier Detection Integrating Semantic Knowledge , 2002, WAIM.