Robustness analysis of multiprocessor schedules

Tasks executing on general purpose multiprocessor platforms exhibit variations in their execution times. As such, there is a need to explicitly consider robustness, i.e., tolerance to these fluctuations. This work aims to quantify the robustness of schedules of directed acyclic graphs (DAGs) on multiprocessors by defining probabilistic robustness metrics and to present a new approach to perform robustness analysis to obtain these metrics. Stochastic execution times of tasks are used to compute completion time distributions which are then used to compute the metrics. To overcome the difficulties involved with the max operation on distributions, a new curve fitting approach is presented using which we can derive a distribution from a combination of analytical and limited simulation based results. The approach has been validated on schedules of time-critical applications in ASML wafer scanners.

[1]  Frederik Stork,et al.  Stochastic resource-constrained project scheduling , 2001 .

[2]  David Blaauw,et al.  Statistical Timing Analysis: From Basic Principles to State of the Art , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  C. E. Clark The Greatest of a Finite Set of Random Variables , 1961 .

[4]  P. Greenwood,et al.  A Guide to Chi-Squared Testing , 1996 .

[5]  M. Berkelaar,et al.  Statistical delay calculation, a linear time method , 1997 .

[6]  Emmanuel Jeannot,et al.  Robust task scheduling in non-deterministic heterogeneous computing systems , 2006, 2006 IEEE International Conference on Cluster Computing.

[7]  Samuel Kotz,et al.  Exact Distribution of the Max/Min of Two Gaussian Random Variables , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Anthony A. Maciejewski,et al.  Stochastic robustness metric and its use for static resource allocations , 2008, J. Parallel Distributed Comput..

[9]  Ladislau Bölöni,et al.  Robust scheduling of metaprograms , 2002 .

[10]  E. Artin,et al.  The Gamma Function , 1964 .

[11]  Jon B. Weissman,et al.  A new metric for robustness with application to job scheduling , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[12]  Jeroen Voeten,et al.  Formal modelling of reactive hardware/software systems , 1997 .

[13]  D. L. Hanson,et al.  On the central limit theorem for the sum of a random number of independent random variables , 1963 .

[14]  J. Hammersley SIMULATION AND THE MONTE CARLO METHOD , 1982 .

[15]  David Blaauw,et al.  Statistical timing analysis for intra-die process variations with spatial correlations , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[16]  Bd Bart Theelen,et al.  Performance Modelling for System-Level Design. Tutorial. , 2005 .

[17]  Emmanuel Jeannot,et al.  Evaluation and Optimization of the Robustness of DAG Schedules in Heterogeneous Environments , 2010, IEEE Transactions on Parallel and Distributed Systems.

[18]  David Blaauw,et al.  AU: Timing Analysis Under Uncertainty , 2003, ICCAD.

[19]  Sander Stuijk,et al.  A scenario-aware data flow model for combined long-run average and worst-case performance analysis , 2006, Fourth ACM and IEEE International Conference on Formal Methods and Models for Co-Design, 2006. MEMOCODE '06. Proceedings..

[20]  Bin Gong,et al.  Estimating Deadline-Miss Probabilities of Tasks in Large Distributed Systems , 2012, GPC.

[21]  D. Vose Risk Analysis: A Quantitative Guide , 2000 .