Vers une plate-forme MapReduce tolérant les fautes byzantines

Les pannes arbitraires sont inherentes aux calculs massivement paralleles tels que ceux vises par le modele MapReduce ; or les implementations courantes du MapReduce ne fournissent pas d’outils permettant de tolerer les fautes byzantines. Il est donc impossible de certifier l’exactitude des resultats obtenus au terme des traitements longs et couteux. Nous presentons dans cet article une architecture permettant de repliquer les tâches dans le modele MapReduce afin de garantir l’integrite des traitements et d’isoler les tâches defaillantes. Dans une premiere etude de performances nous avons evalue certains mecanismes lies a la replication. Une seconde etude, effectuee avec un prototype implementant l’ensemble de l’architecture, a permis de valider certains choix en montrant qu’il est possible de minimiser le surcout de la tolerance aux fautes byzantines.

[1]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[2]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[3]  Franco Travostino,et al.  Challenges facing tomorrow's datacenter: summary of the LADiS workshop , 2008, LADIS '08.

[4]  Ramakrishna Kotla,et al.  Zyzzyva: Speculative Byzantine fault tolerance , 2009 .

[5]  Chryssis Georgiou,et al.  Reliably Executing Tasks in the Presence of Untrusted Entities , 2006, 2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06).

[6]  Gilles Fedak,et al.  Distributed Results Checking for MapReduce in Volunteer Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[7]  Sangmin Lee,et al.  Upright cluster services , 2009, SOSP '09.

[8]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[9]  Miguel Correia,et al.  Making Hadoop MapReduce Byzantine Fault-Tolerant , 2010, DSN 2010.

[10]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[11]  Petr Kuznetsov,et al.  Zeno: Eventually Consistent Byzantine-Fault Tolerance , 2009, NSDI.

[12]  David Mazières,et al.  Beyond One-Third Faulty Replicas in Byzantine Fault Tolerant Systems , 2007, NSDI.

[13]  Chryssis Georgiou,et al.  Reliably Executing Tasks in the Presence of Malicious Processors , 2005, DISC.

[14]  Bobby Bhattacharjee,et al.  Large-scale byzantine fault tolerance: safe but not always live , 2007 .

[15]  Khashayar Niki Maleki,et al.  A brief survey of software architecture concepts and service oriented architecture , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[16]  Pramod Bhatotia,et al.  Reliable data-center scale computations , 2010, LADIS '10.

[17]  Ting Yu,et al.  SecureMR: A Service Integrity Assurance Framework for MapReduce , 2009, 2009 Annual Computer Security Applications Conference.

[18]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[19]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .