Hadoop MapReduce-based Process Anomaly Detection in Smart Factory

This study proposes a method to analyze inventory shrinkage in a smart factory environment using the Internet of Things and Big Data. We developed an algorithm that searches for the loss or misreading point of an object in a parallel and distributed manner using the Hadoop MapReduce framework, and we implemented a web-based anomaly detection system. Through the developed system, the loss/misreading position per point, the total number of loss/misreadings, loss/misreadings versus total yield, and worker-related loss/misreadings can be checked. Introduction The smart factory concept has emerged as a way to cope with increased competition in global manufacturing and maintain future competitiveness. The smart factory concept involves connecting the entire factory to a network that creates a virtual world, which mirrors the conditions of the physical world, in order to optimize products as well as manufacturing process and control[1-3]. The core driving force of the smart factory is information and communications technology, specifically the Internet of Things (IoT), Big Data, cloud computing, and cyber-physical system[4,5]. The application of an RFID system and sensor technology—the core technologies of the IoT—in a smart factory can collect real-time data on work in progress (WIP) movement, production process parameters, and factory environment. Through a cyber-physical system, current factory conditions are detected and future scenarios can be predicted to deal with potential problems and adjust production plans accordingly. If a quality or process problem occurs, the cause can be identified through Big Data analysis. However, even if technologies such as IoT and Big Data are applied to production and logistics processes, it is impossible to totally prevent shrinkage in WIP or finished products due to loss, theft, or damage in the actual physical environment. In this paper, we propose a method to identify problems such as loss, theft, or unrecognized objects in production and logistics processes, by analyzing event logs collected through the IoT and stored in a Big Data repository. The Big Data repository is supported by the Hadoop Distributed File System (HDFS) and uses token replay[6] to check the conformance among process mining methods. The MapReduce framework is used to parallelize token replay. Algorithm Design In this section, we describe the process of collecting events for object flow distribution in Hadoop by ObjectID using MapReduce, performing token replay in parallel, and identifying anomalies. In this study, anomalies are restricted to only the loss and misreading of objects. Event Type The environment for using Auto-ID tags at the event fields for objects collected by the reader and sensor is shown in Figure 1. It includes the ObjectID that represents the object, the eventTime, which