A Study on Data Analysis Process Management System in MapReduce using BPM

MapReduce is a distribution-system-based programming model to process massive data and has been utilized as an analysis model not only in the academic world but also in the industrial fields. However, developers who implement MapReduce have some deficiency in understanding the data analysis, while data analysts have difficulty in programming MapReduce for various analyses by themselves. Hence, it is difficult for developers to provide a demanded analysis output. In order to solve such difficulty between developers of MapReduce and the data analysts, this study proposes a new MapReduce analysis process management system based on BPM (Business Process Management). This system was designed to provide a mutual complimentary intermediary function for MapReduce developers and analysts, and also makes it possible to respond flexibly to any alteration of analysis procedure.

[1]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[4]  Jinwoo Park,et al.  Service integration toward ubiquitous business process management , 2008, 2008 IEEE International Conference on Industrial Engineering and Engineering Management.

[5]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[6]  Mathias Weske,et al.  Business Process Management: A Survey , 2003, Business Process Management.

[7]  David A Chappell,et al.  Enterprise Service Bus , 2004 .