Improving Probe Usability

During our research for characterizing individual computing nodes using Software Probes, one of the remaining issues for quickly probing them was that of the checkpoints’ sizes. This article shows the different approaches we used in order to reduce the Probes’ sizes. We demonstrate that in some cases it is possible to reduce the set of Probes of an application up to 95 % of its original size, thus making our approach for machine characterization useful even through slow networks such as the Internet.

[1]  Erich Schikuta,et al.  Grid Workflow Optimization Regarding Dynamically Changing Resources and Conditions , 2007, Sixth International Conference on Grid and Cooperative Computing (GCC 2007).

[2]  Myeong-Cheol Ko,et al.  CPOC: Effective Static Task Scheduling for Grid Computing , 2005, HPCC.

[3]  Sally A. McKee,et al.  Using Dynamic Binary Instrumentation to Generate Multi-platform SimPoints: Methodology and Accuracy , 2008, HiPEAC.

[4]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[5]  Jens Volkert,et al.  Adaps - A three-phase adaptive prediction system for the run-time of jobs based on user behaviour , 2011, J. Comput. Syst. Sci..

[6]  Emilio Luque,et al.  Tuning Application in a Multi-cluster Environment , 2006, Euro-Par.

[7]  Solomon W. Golomb,et al.  Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[8]  Emilio Luque,et al.  Parallel application signature , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[9]  Emilio Luque,et al.  A Computational Approach to TSP Performance Prediction Using Data Mining , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[10]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[11]  Emilio Luque,et al.  Efficient Execution of Scientific Computation on Geographically Distributed Clusters , 2004, PARA.

[12]  Erich Schikuta,et al.  Grid workflow optimization regarding dynamically changing resources and conditions , 2008 .

[13]  Zhiling Lan,et al.  FREM: A Fast Restart Mechanism for General Checkpoint/Restart , 2011, IEEE Transactions on Computers.

[14]  Jason Duell,et al.  Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .

[15]  Jaspal Subhlok,et al.  Automatic construction and evaluation of performance skeletons , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.