Data mining is one of the most effective methods for fraud detection. This is highlighted by 25 % of organizations that have suffered from economic crimes [1]. This paper presents a case study using real-world data from a large retail company. We identify symptoms of fraud by looking for outliers. To identify the outliers and the context where outliers appear, we learn a regression tree. For a given node, we identify the outliers using the set of examples covered at that node, and the context as the conjunction of the conditions in the path from the root to the node. Surprisingly, at different nodes of the tree, we observe that some outliers disappear and new ones appear. From the business point of view, the outliers that are detected near the leaves of the tree are the most suspicious ones. These are cases of difficult detection, being observed only in a given context, defined by a set of rules associated with the node.
[1]
Felix Naumann,et al.
Data fusion
,
2009,
CSUR.
[2]
VARUN CHANDOLA,et al.
Anomaly detection: A survey
,
2009,
CSUR.
[3]
Wei-Yin Loh,et al.
Classification and regression trees
,
2011,
WIREs Data Mining Knowl. Discov..
[4]
Charu C. Aggarwal,et al.
Outlier Detection for Temporal Data: A Survey
,
2014,
IEEE Transactions on Knowledge and Data Engineering.
[5]
K. Vanhoof,et al.
Data Mining for Fraud Detection: Toward an Improvement on Internal Control Systems?
,
2007
.
[6]
Shuchita Upadhyaya,et al.
Outlier Detection: Applications And Techniques
,
2012
.
[7]
João Gama,et al.
Extração de conhecimento de dados: data mining
,
2015
.
[8]
B. Ripley,et al.
Recursive Partitioning and Regression Trees
,
2015
.