Big data meets process mining: implementing the alpha algorithm with map-reduce

Process mining is an approach to extract process models from event logs. Given the distributed nature of modern information systems, event logs are likely to be distributed across different physical machines. Map-Reduce is a scalable approach for efficient computations on distributed data. In this paper we present the design of a Map-Reduce implementation of the Alpha process mining algorithm, to take advantage of the scalability of the Map-Reduce approach. We provide a experimental results that show the performance and scalability of our implementation.