Methods and tools for challenging experiments on Grid’5000 : a use case on electromagnetic hybrid simulation

In the field of Distributed Systems and High Performance Computing experimental validation is heavily used against an analytic approach which is not feasible any more due to the complexity of those systems in terms of software and hardware.Therefore, researchers have to face many challenges when conducting their experiments, making the process costly and time consuming. Although world scale platforms exist and virtualization technologies enable to multiplex hardware, experiments are most of the time limited in size given the difficulty to perform them at large scale.The level of technical skills required for setting up an appropriate experimental environment is risen with the always increasing complexity of software stacks and hardware nowadays. This in turn provokes that researchers in the pressure to publish and present their results use ad hoc methodologies.Hence, experiments are difficult to track and preserve, preventing future reproduction. A variety of tools have been proposed to address this complexity at experimenting. They were motivated by the need to provide and encourage a sounder experimental process, however, those tools primary addressed much simpler scenarios such as single machine or client/server. In the context of Distributed Systems and High Performance Computing, the objective of this thesis is to make complex experiments, easier to perform, to control, to repeat and to archive. In this thesis we propose two tools for conducting experiments that demand a complex software stack and large scale. The first tool is Expo that enable to efficiently control the dynamic part of an experiment which means all the experiment workflow, monitoring of tasks, and collection of results.Expo features a description language that makes the set up of an experiment withdistributed systems less painful. Comparison against other approaches, scalability tests anduse cases are shown in this thesis which demonstrate the advantage of our approach.The second tool is called Kamelon which addresses the static part of an experiment,meaning the software stack and its configuration.Kameleon is a software appliance builderthat enables to describe and control all the process ofconstruction of a software stack for experimentation.The main contribution of Kameleon is to make easy the setup of complex software stacks andguarantee its post reconstruction.

[1]  Maximilian Ott,et al.  OMF: a control and management framework for networking testbeds , 2010, OPSR.

[2]  Manpreet Singh,et al.  Overview of the ORBIT radio grid testbed for evaluation of next-generation wireless network protocols , 2005, IEEE Wireless Communications and Networking Conference, 2005.

[3]  Olivier Richard,et al.  Managing Large Scale Experiments in Distributed Testbeds , 2013 .

[4]  Calton Pu,et al.  Expertus: A Generator Approach to Automate Performance Testing in IaaS Clouds , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[5]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OSDI '02.

[6]  Maximilian Ott,et al.  From Learning to Researching - Ease the Shift through Testbeds , 2010, TRIDENTCOM.

[7]  Ivan Seskar,et al.  Mobile Experiments Made Easy with OMF/Orbit , 2009, SIGCOMM 2009.

[8]  George Karypis,et al.  Partitioning and Load Balancing for Emerging Parallel Applications and Architectures , 2006, Parallel Processing for Scientific Computing.

[9]  Yanyan Wang,et al.  Automating experimentation with distributed systems using generative techniques , 2006 .

[10]  Mahdi Ben Alaya,et al.  FRAMESELF: an ontology‐based framework for the self‐management of machine‐to‐machine systems , 2015, Concurr. Comput. Pract. Exp..

[11]  Yinong Chen,et al.  Typical Virtual Appliances: An optimized mechanism for virtual appliances provisioning and management , 2011, J. Syst. Softw..

[12]  Yeh-Ching Chung,et al.  PQEMU: A Parallel System Emulator Based on QEMU , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[13]  Jason Nieh,et al.  Teaching operating systems using virtual appliances and distributed version control , 2010, SIGCSE.

[14]  Adrien Lebre,et al.  Flauncher and DVMS Deploying and Scheduling Thousands of Virtual Machines on Hundreds of Nodes Distributed Geographically , 2013 .

[15]  Marta Mattoso,et al.  Towards a Taxonomy for Cloud Computing from an e-Science Perspective , 2010, Cloud Computing.

[16]  Siaterlis Christos,et al.  A survey of software tools for the creation of networked testbeds , 2010 .

[17]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[18]  Henri Casanova,et al.  SimGrid: A Generic Framework for Large-Scale Distributed Experiments , 2008, Tenth International Conference on Computer Modeling and Simulation (uksim 2008).

[19]  Bin Chen,et al.  Fast, On-Demand Software Deployment with Lightweight, Independent Virtual Disk Images , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[20]  Osamu Tatebe,et al.  Pwrake: a parallel and distributed flexible workflow management tool for wide-area data intensive computing , 2010, HPDC '10.

[21]  Thierry Rakotoarivelo,et al.  Why simulate when you can experience , 2011 .

[22]  Eric Aubanel,et al.  EHGRID: An emulator of heterogeneous computational grids , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[23]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[24]  Amin Vahdat,et al.  Remote Control: Distributed Application Configuration, Management, and Visualization with Plush , 2007, LISA.

[25]  JENS GUSTEDT,et al.  Experimental Methodologies for Large-Scale Systems: a Survey , 2009, Parallel Process. Lett..

[26]  David Abramson,et al.  Parameter Exploration in Science and Engineering Using Many-Task Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[27]  Yanyan Wang,et al.  Four enhancements to automateddistributed system experimentation methods , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[28]  Hai Jin,et al.  A Cloud Service Cache System Based on Memory Template of Virtual Machine , 2011, 2011 Sixth Annual Chinagrid Conference.

[29]  Dennis Shasha,et al.  ReproZip: Using Provenance to Support Computational Reproducibility , 2013, TaPP.

[30]  Ian Wakeman,et al.  The state of peer-to-peer simulators and simulations , 2007, CCRV.

[31]  Wolfgang J. R. Hoefer,et al.  The Transmission-Line Matrix Method--Theory and Applications , 1985 .

[32]  Corinne Touati,et al.  Toward an experiment engine for lightweight grids , 2007, GridNets '07.

[33]  Emmanuel Jeannot,et al.  Experimental validation of grid algorithms: A comparison of methodologies , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[34]  Miron Livny,et al.  The NMI Build & Test Laboratory: Continuous Integration Framework for Distributed Computing Software , 2006, LISA.

[35]  Andres Löh,et al.  NixOS: a purely functional Linux distribution , 2008, ICFP 2008.

[36]  Jeannie R. Albrecht Bringing big systems to small schools: distributed systems for undergraduates , 2009, SIGCSE '09.

[37]  Wolfgang Kellerer,et al.  ProtoPeer: a P2P toolkit bridging the gap between simulation and live deployement , 2009, SimuTools.

[38]  Walid Dabbous,et al.  NEPI: using independent simulators, emulators, and testbeds for easy experimentation , 2010, OPSR.

[39]  Eric E. Aubanel,et al.  PaGrid: A Mesh Partitioner for Computational Grids , 2006, Journal of Grid Computing.

[40]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[41]  Philip J. Guo CDE: Run Any Linux Application On-Demand Without Installation , 2011, LISA.

[42]  P. Johns A Symmetrical Condensed Node for the TLM Method , 1987 .

[43]  Antoine Petitet,et al.  Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005 .

[44]  Pascale Vicat-Blanc Primet,et al.  A user-oriented test suite for transport protocols comparison in datagrid context , 2009, 2009 International Conference on Information Networking.

[45]  Qingbo Wang,et al.  Simplifying Service Deployment with Virtual Appliances , 2008, 2008 IEEE International Conference on Services Computing.

[46]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[47]  Olivier Richard,et al.  TakTuk, adaptive deployment of remote executions , 2009, HPDC '09.

[48]  Emmanuel Jeanvoine,et al.  Design and Evaluation of a Virtual Experimental Environment for Distributed Systems , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[49]  Joseph Emeras,et al.  Reconstructing the software environment of an experiment with kameleon , 2012, COMPUTE.

[50]  Brent N. Chun DART: Distributed Automated Regression Testing for Large-Scale Network Applications , 2004, OPODIS.

[51]  Olivier Richard,et al.  A survey of general-purpose experiment management tools for distributed systems , 2015, Future Gener. Comput. Syst..

[52]  Sander van der Burg,et al.  Disnix: A toolset for distributed deployment , 2014, Sci. Comput. Program..

[53]  Eli M. Dow,et al.  Xen and the Art of Repeated Research , 2004, USENIX Annual Technical Conference, FREENIX Track.

[54]  Yolanda Gil,et al.  Pegasus: Mapping Scientific Workflows onto the Grid , 2004, European Across Grids Conference.

[55]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[56]  Jens Gustedt,et al.  A Workflow-Inspired, Modular and Robust Approach to Experiments in Distributed Systems , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[57]  Gregory A. Koenig,et al.  Optimizing Distributed Application Performance Using Dynamic Grid Topology-Aware Load Balancing , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[58]  Katarzyna Keahey,et al.  Contextualization: Providing One-Click Virtual Clusters , 2008, 2008 IEEE Fourth International Conference on eScience.

[59]  Brice Videau,et al.  Expo : un moteur de conduite d'expériences pour plates-formes Dédiées , 2008 .

[60]  Walid Dabbous,et al.  NEPI: An integration framework for Network Experimentation , 2011, SoftCOM 2011, 19th International Conference on Software, Telecommunications and Computer Networks.

[61]  Walid Dabbous,et al.  Experimentation with large scale ICN multimedia services on the Internet made easy , 2013 .

[62]  Yanyan Wang,et al.  Automating experimentation on distributed testbeds , 2005, ASE.

[63]  Fabienne Boyer,et al.  Reliable self-deployment of cloud applications , 2014, SAC.

[64]  David L. Donoho,et al.  A Universal Identifier for Computational Results , 2011, ICCS.

[65]  Arian Maleki,et al.  Reproducible Research in Computational Harmonic Analysis , 2009, Computing in Science & Engineering.

[66]  Andrew P. Davison Automated Capture of Experiment Context for Easier Reproducibility in Computational Research , 2012, Computing in Science & Engineering.

[67]  David Brumley,et al.  Virtual Appliances for Deploying and Maintaining Software , 2003, LISA.

[68]  Ulas C. Kozat,et al.  In-network live snapshot service for recovering virtual infrastructures , 2011, IEEE Network.

[69]  Pascal Felber,et al.  SPLAY: Distributed Systems Evaluation Made Simple (or How to Turn Ideas into Live Systems in a Breeze) , 2009, NSDI.

[70]  Eric Eide,et al.  An Experimentation Workbench for Replayable Networking Research , 2007, NSDI.

[71]  Christoforos E. Kozyrakis,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 335 Dune: Safe User-level Access to Privileged Cpu Features , 2022 .

[72]  Franck Cappello,et al.  Grid'5000: a large scale, reconfigurable, controlable and monitorable Grid platform , 2005 .

[73]  Sukyoung Ryu,et al.  Formal specification of a JavaScript module system , 2012, OOPSLA '12.

[74]  Alexandra Carpen-Amarie,et al.  Stepping Stones to Reproducible Research: A Study of Current Practices in Parallel Computing , 2014, Euro-Par Workshops.

[75]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.