An Executable Specification of Map-Join-Reduce Using Haskell

The Map-Join-Reduce programming model effectively supports the join operation among different heterogeneous data sets by adding the Join module and processes the multiway joining directly. In this paper, we propose a rigorous description of the Map-Join-Reduce that performs as an executable specification. First, this paper briefly introduces the differences between the Map-Join-Reduce and the MapReduce. Then, we use Haskell to specify each module of the Map-Join-Reduce programming model and analyze the structure and function of each module. Finally, we test the specification by analyzing an example of the mall sales records. The executable specification contributes to helping the developers to unscramble the relationship between the MapReduce and the Map-Join-Reduce, which may serve as a basis for further development of the theory of related programming model design. Furthermore, the most important function of an executable specification is guaranteeing the target informal or semi-formal model with interesting properties. This paper is a forward step to prepare for verifying related properties and, even, providing verified prototypes.

[1]  Diego Pérez Leándrez,et al.  Formal performance evaluation of the Map/Reduce framework within cloud computing , 2015, The Journal of Supercomputing.

[2]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[3]  Jaehwan Lee,et al.  Optimizing the Hadoop MapReduce Framework with high-performance storage devices , 2015, The Journal of Supercomputing.

[4]  Jian-Tao Zhou,et al.  Strategies and Methods for Cloud Migration , 2014, Int. J. Autom. Comput..

[5]  Jens Myrup Pedersen,et al.  Using latency as a QoS indicator for global cloud computing services , 2013, Concurr. Comput. Pract. Exp..

[6]  Philipp Haller,et al.  A programming model and foundation for lineage-based distributed computation , 2018, Journal of Functional Programming.

[7]  刘磊,et al.  An abstract description method of Map-Reduce-Merge using Haskell , 2013 .

[8]  Yongxuan Lai,et al.  Research on Cloud Databases , 2012 .

[9]  Milind A. Bhandarkar,et al.  MapReduce programming with apache Hadoop , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[10]  Peng Zhang,et al.  SDAC: A model for analysis of the execution semantics of data processing framework in cloud , 2018, Comput. Lang. Syst. Struct..

[11]  Sherif Sakr,et al.  The family of mapreduce and large-scale data processing systems , 2013, CSUR.

[12]  Rong Gu,et al.  SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters , 2014, J. Parallel Distributed Comput..

[13]  Anthony K. H. Tung,et al.  MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters , 2011, IEEE Transactions on Knowledge and Data Engineering.

[14]  Masami Hagiya,et al.  Using Coq in Specification and Program Extraction of Hadoop MapReduce Applications , 2011, SEFM.

[15]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[16]  Jun Sun,et al.  Towards Formal Modeling and Verification of Cloud Architectures: A Case Study on Hadoop , 2013, 2013 IEEE Ninth World Congress on Services.

[17]  Rita Loogen,et al.  Implementing Parallel Google Map-Reduce in Eden , 2009, Euro-Par.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[20]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[21]  Komal Shringare,et al.  Apache Hadoop Goes Realtime at Facebook , 2015 .

[22]  Einar Broch Johnsen,et al.  ABS-YARN: A Formal Framework for Modeling Hadoop YARN Clusters , 2016, FASE.

[23]  Rajiv Ranjan,et al.  G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..

[24]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[25]  Chunling Cheng,et al.  A Multi-dimensional Index Structure Based on Improved VA-file and CAN in the Cloud , 2014, Int. J. Autom. Comput..

[26]  Xuan Zhou,et al.  Architecting Big Data: Challenges, Studies and Forecasts: Architecting Big Data: Challenges, Studies and Forecasts , 2011 .