Comparing Three Clustering-based Scheduling Methods for Energy-Aware Rapid Design of MP2SoCs

In recent years, the Electronic Design Automation (EDA) community shifted spotlights from performance to energy efficiency. Consequently, energy consumption becomes a key criterion to take into consideration during Design Space Exploration (DSE). Finding a trade-off between energy consumption and performance early in the design flow in order to satisfy time-to-market is a design challenge of EDA tools. In this paper, we propose the Energy-aWAre Rapid Design of MP2SoC (EWARDS) framework. The EWARDS framework aims at exploring, at design time, the performance and energy capabilities of modern Massively Parallel Multi-Processors System-on-Chip (MP2SoC). The key contribution of the proposed framework is the implementation of an energy-aware scheduling process, named PREESMPE, that combines state-of-the-art power management techniques together with Clustering-based Scheduling. The scheduling process is integrated into a Model-Driven Engineering (MDE)-based DSE approach to optimize both performance and energy efficiency in MP2SoC. Moreover, EWARDS extends the Modeling and Analysis of Real-Time and Embedded systems (MARTE) profile with power aspects of MP2SoC systems providing a high-level design entry. To demonstrate the efficiency of the proposed approach, we conducted experiments using the H.263 codec and the FFT algorithm. The obtained results demonstrate that the energy-aware scheduling process can effectively save energy in MP2SoC systems. They also confirmed that our MDE-based approach accelerates the DSE process while generating energy-efficient design decisions.

[1]  Timo Hämäläinen,et al.  UML 2.0 profile for embedded system design , 2005, Design, Automation and Test in Europe.

[2]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[3]  Maxime Pelcat,et al.  MARTE to ΠSDF transformation for data-intensive applications analysis , 2014, Proceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing.

[4]  Erwin A. de Kock,et al.  Design and programming of embedded multiprocessors: an interface-centric approach , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[5]  Alberto L. Sangiovanni-Vincentelli,et al.  Platform-Based Design and Software Design Methodology for Embedded Systems , 2001, IEEE Des. Test Comput..

[6]  Rabie Ben Atitallah,et al.  Early power-aware Design Space Exploration for embedded systems: MPEG-2 case study , 2014, 2014 International Symposium on System-on-Chip (SoC).

[7]  Cécile Belleudy,et al.  Accurate energy characterization of OS services in embedded systems , 2012, EURASIP J. Embed. Syst..

[8]  Matthias Hagner,et al.  UML-Based Analysis of Power Consumption for Real-Time Embedded Systems , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[9]  Iulian Ober,et al.  A real-time profile for UML , 2006, International Journal on Software Tools for Technology Transfer.

[10]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[11]  Mihalis Yannakakis,et al.  Towards an Architecture-Independent Analysis of Parallel Algorithms , 1990, SIAM J. Comput..

[12]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[13]  Rami G. Melhem,et al.  Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multiprocessor Real-Time Systems , 2003, IEEE Trans. Parallel Distributed Syst..

[14]  Samee Ullah Khan,et al.  Power-aware resource allocation in computer clusters using dynamic threshold voltage scaling and dynamic voltage scaling: comparison and analysis , 2015, Cluster Computing.

[15]  David Lugato Model-driven engineering for high-performance computing applications , 2008 .

[16]  Julien DeAntoni,et al.  Execution of heterogeneous models for thermal analysis with a multi-view approach , 2014, Proceedings of the 2014 Forum on Specification and Design Languages (FDL).

[17]  Samee Ullah Khan,et al.  A goal programming based energy efficient resource allocation in data centers , 2012, The Journal of Supercomputing.

[18]  Jean-François Nezan,et al.  On Memory Reuse Between Inputs and Outputs of Dataflow Actors , 2016, ACM Trans. Embed. Comput. Syst..

[19]  Paulo Romero Martins Maciel,et al.  Transforming UML state machines into stochastic Petri nets for energy consumption estimation of embedded systems , 2012, 2012 Sustainable Internet and ICT for Sustainability (SustainIT).

[20]  Robert de Simone,et al.  Combining SystemC, IP-XACT and UML/MARTE in model-based SoC design , 2011 .

[21]  Gustavo Rau de Almeida Callou,et al.  A coloured petri net based approach for estimating execution time and energy consumption in embedded systems , 2008, SBCCI '08.

[22]  Jean-François Nezan,et al.  An Open Framework for Rapid Prototyping of Signal Processing Applications , 2009, EURASIP J. Embed. Syst..

[23]  Edward A. Lee,et al.  Ptolemy: A Framework for Simulating and Prototyping Heterogenous Systems , 2001, Int. J. Comput. Simul..

[24]  Xiao Qin,et al.  EAD and PEBD: Two Energy-Aware Duplication Scheduling Algorithms for Parallel Tasks on Homogeneous Clusters , 2011, IEEE Transactions on Computers.

[25]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[26]  Julien DeAntoni,et al.  Multi-view Power Modeling Based on UML, MARTE and SysML , 2012, 2012 38th Euromicro Conference on Software Engineering and Advanced Applications.

[27]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[28]  Éric Rutten,et al.  An MDE Approach for Rapid Prototyping and Implementation of Dynamic Reconfigurable Systems , 2015, ACM Trans. Design Autom. Electr. Syst..

[29]  Ishfaq Ahmad,et al.  On Exploiting Task Duplication in Parallel Program Scheduling , 1998, IEEE Trans. Parallel Distributed Syst..

[30]  Antonio Núñez,et al.  A Unified System-Level Modeling and Simulation Environment for MPSoC design: MPEG-4 Decoder Case Study , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[31]  Jean-Luc Dekeyser,et al.  A Model-Driven Approach for Hybrid Power Estimation in Embedded Systems Design , 2011, EURASIP J. Embed. Syst..

[32]  Trevor N. Mudge,et al.  Power: A First-Class Architectural Design Constraint , 2001, Computer.

[33]  Jean-Luc Dekeyser,et al.  Multilevel MPSOC simulation using an MDE approach , 2007, 2007 IEEE International SOC Conference.

[34]  Kees Goossens,et al.  The CompSOC design flow for virtual execution platforms , 2013 .

[35]  Jing Chen,et al.  Adaptive energy-efficient scheduling algorithm for parallel tasks on homogeneous clusters , 2014, J. Netw. Comput. Appl..

[36]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[37]  Rainer Leupers,et al.  Multiprocessor Systems on Chip , 2011 .

[38]  Chantal Ykman-Couvreur,et al.  The COMPLEX reference framework for HW/SW co-design and power management supporting platform-based design-space exploration , 2013, Microprocess. Microsystems.

[39]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[40]  M.A. Horowitz,et al.  A variable-frequency parallel I/O interface with adaptive power-supply regulation , 2000, IEEE Journal of Solid-State Circuits.

[41]  Yajun Ha,et al.  Multiprocessor systems synthesis for multiple use-cases of multiple applications on FPGA , 2008, TODE.

[42]  Krzysztof Kuchcinski,et al.  LEneS: task scheduling for low-energy systems using variable supply voltage processors , 2001, ASP-DAC '01.

[43]  Sander Stuijk,et al.  A Predictable Multiprocessor Design Flow for Streaming Applications with Dynamic Behaviour , 2010, DSD.

[44]  Pascal Bouvry,et al.  Multi-objective evolutionary algorithms for energy-aware scheduling on distributed computing systems , 2014, Appl. Soft Comput..

[45]  Lothar Thiele,et al.  Modular design space exploration framework for embedded systems , 2005 .

[46]  Albert Y. Zomaya,et al.  Energy-aware parallel task scheduling in a cluster , 2013, Future Gener. Comput. Syst..

[47]  Andy D. Pimentel,et al.  A system-level infrastructure for multidimensional MP-SoC design space co-exploration , 2013, TECS.

[48]  Tao Yang,et al.  A fast static scheduling algorithm for DAGs on an unbounded number of processors , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[49]  Luciano Lavagno,et al.  Improving the design flow for parallel and heterogeneous architectures running real-time applications: The PHARAON FP7 project , 2014, Microprocess. Microsystems.

[50]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[51]  Stephen Berard,et al.  Implications of Historical Trends in the Electrical Efficiency of Computing , 2011, IEEE Annals of the History of Computing.

[52]  Ed F. Deprettere,et al.  An Approach for Quantitative Analysis of Application-Specific Dataflow Architectures , 1997, ASAP.

[53]  Rainer Leupers,et al.  Multiprocessor Systems on Chip: Design Space Exploration , 2011 .

[54]  Cagkan Erbas,et al.  System-level modelling and design space exploration for multiprocessor embedded system-on-chip architectures , 2006 .

[55]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[56]  Benoît Dupont de Dinechin,et al.  K-Periodic schedules for evaluating the maximum throughput of a Synchronous Dataflow graph , 2012, 2012 International Conference on Embedded Computer Systems (SAMOS).

[57]  Mohamed Abid,et al.  On Exploiting Energy-Aware Scheduling Algorithms for MDE-Based Design Space Exploration of MP2SoC , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[58]  Mohamed Abid,et al.  FPGA-based many-core System-on-Chip design , 2015, Microprocess. Microsystems.

[59]  Philippe Chrétienne,et al.  C.P.M. Scheduling with Small Communication Delays and Task Duplication , 1991, Oper. Res..

[60]  Luca Benini,et al.  An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[61]  Maxime Pelcat,et al.  Preesm: A dataflow-based rapid prototyping framework for simplifying multicore DSP programming , 2014, 2014 6th European Embedded Design in Education and Research Conference (EDERC).

[62]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[63]  Simon Holmbacka,et al.  Energy-Awareness and Performance Management with Parallel Dataflow Applications , 2017, J. Signal Process. Syst..

[64]  Mohsen Sharifi,et al.  PASTA: a power-aware solution to scheduling of precedence-constrained tasks on heterogeneous computing resources , 2012, Computing.

[65]  Sander Stuijk,et al.  SDF^3: SDF For Free , 2006, Sixth International Conference on Application of Concurrency to System Design (ACSD'06).

[66]  Chaitali Chakrabarti,et al.  Variable voltage task scheduling algorithms for minimizing energy , 2001, ISLPED '01.

[67]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[68]  Rabie Ben Atitallah,et al.  An Efficient Framework for Power-Aware Design of Heterogeneous MPSoC , 2013, IEEE Transactions on Industrial Informatics.

[69]  Jason Cong,et al.  PARADE: A cycle-accurate full-system simulation Platform for Accelerator-Rich Architectural Design and Exploration , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[70]  Elvinia Riccobene,et al.  A SoC design methodology involving a UML 2.0 profile for SystemC , 2005, Design, Automation and Test in Europe.

[71]  S. Ranka,et al.  Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors , 1992, Proceedings Supercomputing '92.

[72]  Bd Bart Theelen,et al.  Performance Modelling for System-Level Design. Tutorial. , 2005 .

[73]  Arno Puder,et al.  A Comparison between Relational and Operational QVT Mappings , 2009, 2009 Sixth International Conference on Information Technology: New Generations.

[74]  Qingbo Wu,et al.  Workflow scheduling in cloud: a survey , 2015, The Journal of Supercomputing.

[75]  Amit Kumar Singh,et al.  Energy optimization by exploiting execution slacks in streaming applications on Multiprocessor Systems , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[76]  Armin Zimmermann,et al.  Extending the software tool TimeNET by power consumption estimation of UML MARTE models , 2014, 2014 4th International Conference On Simulation And Modeling Methodologies, Technologies And Applications (SIMULTECH).

[77]  Armin Zimmermann,et al.  Formal description of an approach for power consumption estimation of embedded systems , 2014, 2014 24th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[78]  Markus Rupp,et al.  A Precise High-Level Power Consumption Model for Embedded Systems Software , 2011, EURASIP J. Embed. Syst..

[79]  Dharma P. Agrawal,et al.  A task duplication based scheduling algorithm for heterogeneous systems , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[80]  Osman S. Unsal,et al.  DESSERT: DESign Space ExploRation Tool based on power and energy at System-Level , 2014, 2014 27th IEEE International System-on-Chip Conference (SOCC).

[81]  Timo Hämäläinen,et al.  MARTE profile extension for modeling dynamic power management of embedded systems , 2012, J. Syst. Archit..

[82]  Timo Hämäläinen,et al.  UML-based multiprocessor SoC design framework , 2006, TECS.

[83]  Jean-François Nezan,et al.  PiMM: Parameterized and Interfaced dataflow Meta-Model for MPSoCs runtime reconfiguration , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[84]  Mahmut T. Kandemir,et al.  The design and use of simplePower: a cycle-accurate energy estimation tool , 2000, Proceedings 37th Design Automation Conference.

[85]  Laszlo Hars,et al.  Pseudorandom recursions II , 2012, EURASIP J. Embed. Syst..

[86]  Benoît Dupont de Dinechin,et al.  A clustered manycore processor architecture for embedded and accelerated applications , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[87]  Ian Gray,et al.  Challenges in software development for multicore System-on-Chip development , 2012, 2012 23rd IEEE International Symposium on Rapid System Prototyping (RSP).

[88]  Rainer Leupers,et al.  Throughput driven transformations of Synchronous Data Flows for mapping to heterogeneous MPSoCs , 2012, 2012 International Conference on Embedded Computer Systems (SAMOS).

[89]  Osman S. Unsal,et al.  PETS: Power and energy estimation tool at system-level , 2014, Fifteenth International Symposium on Quality Electronic Design.

[90]  Jozef Hooman,et al.  Correct Development of Embedded Systems , 2004 .

[91]  Amit Kumar Singh,et al.  Accelerating throughput-aware runtime mapping for heterogeneous MPSoCs , 2013, TODE.

[92]  Luigi Carro,et al.  Early Embedded Software Design Space Exploration Using UML-Based Estimation , 2006, Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06).

[93]  Juan Li,et al.  An overview of energy efficiency techniques in cluster computing systems , 2013, Cluster Computing.

[94]  Dharma P. Agrawal,et al.  Optimal Scheduling Algorithm for Distributed-Memory Machines , 1998, IEEE Trans. Parallel Distributed Syst..

[95]  Mohamed Abid,et al.  Automatic Generation of S-LAM Descriptions from UML/MARTE for the DSE of Massively Parallel Embedded Systems , 2015, SNPD.

[96]  Jean-Philippe Diguet,et al.  Power and Energy Estimations in Model-Based Design , 2008, FDL.

[97]  Wolfgang Müller,et al.  Architectural low-power design using transaction-based system modeling and simulation , 2014, 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV).

[98]  Kees G. W. Goossens,et al.  The Petrol approach to high-level power estimation , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[99]  Julien DeAntoni,et al.  TimeSquare: Treat Your Models with Logical Time , 2012, TOOLS.

[100]  Kuldip Singh,et al.  An Improved Duplication Strategy for Scheduling Precedence Constrained Graphs in Multiprocessor Systems , 2003, IEEE Trans. Parallel Distributed Syst..

[101]  Wei Du,et al.  An energy efficient clustering-based scheduling algorithm for parallel tasks on homogeneous DVS-enabled clusters , 2012, Proceedings of the 2012 IEEE 16th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[102]  Maxime Pelcat,et al.  A System-Level Architecture Model for Rapid Prototyping of Heterogeneous Multicore Embedded Systems , 2009 .

[103]  Sander Stuijk,et al.  Throughput Analysis of Synchronous Data Flow Graphs , 2006, Sixth International Conference on Application of Concurrency to System Design (ACSD'06).