Bringing Context to Apache Hadoop

One of the first challenges when deploying MapReduce over pervasive grids is that Apache Hadoop, the most known MapReduce distribution, requires a highly structured environment such as a dedicated cluster or a cloud infrastructure. In pervasive environments, context-awareness becomes essential to coordinate the resources (task scheduling, data placement, etc.) and to adapt them to the environment variable behavior. In this paper, we present our first efforts to improve Hadoop by introducing context-awareness on its scheduling algorithms. The experiments demonstrate that context-awareness allows Hadoop to better scale based on actual resource availability, therefore improving the task allocation pattern and rationalizing resource usage in a heterogeneous dynamic network. Keywords–Context-awareness; MapReduce; Apache Hadoop; job scheduling.

[1]  Kurt Geihs,et al.  Playing MUSIC — building context‐aware and self‐adaptive mobile applications , 2013, Softw. Pract. Exp..

[2]  Matthias Baldauf A Device-aware Spatial 3 D Visualization Platform for Mobile Urban Exploration , 2010 .

[3]  G. Nolan,et al.  Computational solutions to large-scale data management and analysis , 2010, Nature Reviews Genetics.

[4]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[5]  Kaladhar Voruganti,et al.  CASH: context aware scheduler for Hadoop , 2012, ICACCI '12.

[6]  Daniel Diaz,et al.  PER-MARE: Adaptive Deployment of MapReduce over Pervasive Grids , 2013, 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[7]  Jean-Marc Pierson,et al.  Pervasive Grids Challenges and Opportunities , 2008 .

[8]  Chao Tian,et al.  A Dynamic MapReduce Scheduler for Heterogeneous Workloads , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[9]  Philippe Roose,et al.  Kalimucho: contextual deployment for QoS management , 2011, DAIS'11.

[10]  Jérôme Gensel,et al.  Representing Context for an Adaptative Awareness Mechanism , 2004, CRIWG.

[11]  Elias N. Houstis,et al.  Towards a Pervasive Grid , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[12]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[13]  M.K. Pinheiro,et al.  Personalizing Web-Based Information Systems through Context-Aware User Profiles , 2008, 2008 The Second International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies.

[14]  Quan Chen,et al.  SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[15]  Douglas G. Down,et al.  COSHH: A classification and optimization based scheduler for heterogeneous Hadoop systems , 2014, Future Gener. Comput. Syst..

[16]  Yolande Berbers,et al.  Context-driven migration and diffusion of pervasive services on the OSGi framework , 2010, Int. J. Auton. Adapt. Commun. Syst..

[17]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[18]  Dragan Stojanovic Context-Aware Mobile and Ubiquitous Computing for Enhanced Usability - Adaptive Technologies and Applications , 2009, Context-Aware Mobile and Ubiquitous Computing for Enhanced Usability.

[19]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Matthias Baldauf,et al.  A survey on context-aware systems , 2007, Int. J. Ad Hoc Ubiquitous Comput..