Specification and Implementation of a Data Generator to Simulate Fraudulent User Behavior

Fraud is a widespread international problem for enterprises. Organizations increasingly use self-learning classifiers to detect fraud. Such classifiers need training data to successfully distinguish normal from fraudulent behavior. However, data containing authentic fraud scenarios is often not available for researchers. Therefore, we have implemented a data generation tool, which simulates fraudulent and non-fraudulent user behavior within the purchase-to-pay business process of an ERP system. We identified fraud scenarios from literature and implemented them as automated routines using SAP’s programming language ABAP. The data generated can be used to train fraud detection classifiers as well as to benchmark existing ones.