A Reactive Scheduling Strategy Applied On MapReduce OLAM Operators System

The combination of Data warehousing and data analysis techniques such as OLAP (Online Analytic Processing) and data mining through the Hadoop framework is an innovative way to treat large volumes of data. However, this way poses serious scheduling and combining tasks issues that bring more challenges. In this paper, we propose strategies to answer these questions, namely parallel OLAM (Online Analytic Mining) MapReduce Operators and a Reactive Scheduling Policy. OLAM MapReduce Operators divide jobs into two parts, the first includes all the operators that are used to create an OLAM CUBE and the second includes those who exploit the cube by data mining algorithms. The proposed policy coordinates the workflow generated by these operators, relying on model-based events. Our simulation experience shows that our strategy has a cumulative force that it reduces the execution time of the entire cluster at each request.

[1]  Sabine Loudcher,et al.  A Data Mining-Based OLAP Aggregation of Complex Data: Application on XML Documents , 2006, Int. J. Data Warehous. Min..

[2]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  L. S. S. Reddy,et al.  Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments , 2012, ArXiv.

[5]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[6]  Torben Bach Pedersen,et al.  Multidimensional Database Technology , 2001, Computer.

[7]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[8]  Malgorzata Steinder,et al.  Performance-driven task co-scheduling for MapReduce environments , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[9]  Jennifer Chiang,et al.  Issues for On-Line Analytical Mining of Data Warehouses , 1998 .

[10]  Kemafor Anyanwu,et al.  Scheduling Hadoop Jobs to Meet Deadlines , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[11]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[12]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[13]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[14]  Murali S. Kodialam,et al.  Scheduling in mapreduce-like systems for fast completion time , 2011, 2011 Proceedings IEEE INFOCOM.

[15]  Wen-Yang Lin,et al.  OLAM Cube Selection in On-Line Multidimensional Association Rules Mining System , 2004, KES.

[16]  Sohail Asghar,et al.  An Architecture for Integrated Online Analytical Mining , 2011 .

[17]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[18]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[19]  Simon Fong,et al.  A Conceptual Model for Combining Enhanced OLAP and Data Mining Systems , 2009, 2009 Fifth International Joint Conference on INC, IMS and IDC.

[20]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[21]  Thomas Sandholm,et al.  Dynamic Proportional Share Scheduling in Hadoop , 2010, JSSPP.

[22]  Michael Schrefl,et al.  Active data warehouses: complementing OLAP with analysis rules , 2001, Data Knowl. Eng..

[23]  Ross Mcnab,et al.  Simjava: A Discrete Event Simulation Library For Java , 1998 .

[24]  Gottfried Vossen,et al.  Multidimensional normal forms for data warehouse design , 2003, Inf. Syst..

[25]  Raghu Ramakrishnan,et al.  Exploratory mining in cube space , 2006, Data Mining and Knowledge Discovery.

[26]  Zhiqiang Ma,et al.  The Limitation of MapReduce: A Probing Case and a Lightweight Solution , 2010 .

[27]  Jiawei Han,et al.  OLAP Mining: Integration of OLAP with Data Mining , 1997, DS-7.

[28]  Volker Markl,et al.  Improving OLAP performance by multidimensional hierarchical clustering , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[29]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..