Run-time parallelization switching for resource optimization on an MPSoC platform

The recent development of multimedia applications on mobile terminals raised the need for flexible and scalable computing platforms that are capable of providing considerable (application specific) computational performance within a low cost and a low energy budget. The MPSoC with multi-disciplinary approach, resolving application mapping, platform architecture and runtime management issues, provides such multiple heterogeneous, flexible processing elements. In MPSoC, the run-time manager takes the design time exploration information as an input and selects an active Pareto point based on quality requirement and available platform resources, where a Pareto point corresponds to a particular parallelization possibility of the target application. To use system’s scalability at best and enhance application’s flexibility a step further, the resource management and Pareto point selection decisions need to be adjustable at run-time. This research work experiment run-time Pareto point switching for the MPEG-4 encoder. The work involves design time exploration and then embedding of two parallelization possibilities of the MPEG-4 encoder into one single component and enabling run-time switching between these parallelizations, to give run-time control over adjusting performance-cost criteria and allocation deallocation of hardware resources at run-time. The new system has the capability to encode each video frame with different parallelization. The obtained results offer a number of operating points on the Pareto curve in between the previous ones at sequence encoding level. The run-time manager can improve application performance up to 50 % or can save memory bandwidth up to 15 %, according to quality request.

[1]  Rudy Lauwereins,et al.  Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[2]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[3]  Vincenzo De Florio,et al.  The algorithm of pipelined gossiping , 2006, J. Syst. Archit..

[4]  Marco Platzner,et al.  Field Programmable Logic and Application , 2004, Lecture Notes in Computer Science.

[5]  Hannu Tenhunen,et al.  Compact generic intermediate representation (CGIR) to enable late binding in coarse grained reconfigurable architectures , 2011, 2011 International Conference on Field-Programmable Technology.

[6]  Vincenzo De Florio,et al.  Robust and Tuneable Family of Gossiping Algorithms , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[7]  Thomas J. Ashby,et al.  MPA: Parallelizing an Application onto a Multicore Platform Made Easy , 2009, IEEE Micro.

[8]  Erik Brockmeyer,et al.  An automatic Scratch Pad Memory management tool and MPEG-4 encoder case study , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[9]  Chantal Ykman-Couvreur,et al.  Systematic Methodology for Real-Time Cost-Effective Mapping of Dynamic Concurrent Task-Based Systems on Heterogenous Platforms , 2007 .

[10]  Hannu Tenhunen,et al.  Private configuration environments (PCE) for efficient reconfiguration, in CGRAs , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[11]  Luca Benini,et al.  Workload and user experience-aware Dynamic Reliability Management in multicore processors , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Chantal Ykman-Couvreur,et al.  Pareto-Based Application Specification for MP-SoC Customized Run-Time Management , 2006, 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[13]  H. Corporaal,et al.  Design-Time Application Exploration for MP-SoC Customized Run-Time Management , 2005, 2005 International Symposium on System-on-Chip.

[14]  Francky Catthoor,et al.  A systematic approach to classify design-time global scheduling techniques , 2013, CSUR.

[15]  Henk Corporaal,et al.  System-scenario-based design of dynamic embedded systems , 2009, TODE.

[16]  Hannu Tenhunen,et al.  Compression Based Efficient and Agile Configuration Mechanism for Coarse Grained Reconfigurable Architectures , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[17]  Luca Benini,et al.  An integrated, programming model-driven framework for NoC-QoS support in cluster-based embedded many-cores , 2013, Parallel Comput..

[18]  Partha Pratim Pande,et al.  Energy-efficient multicore chip design through cross-layer approach , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Chantal Ykman-Couvreur,et al.  An industrial design space exploration framework for supporting run-time resource management on multi-core systems , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[20]  Vincent Nollet,et al.  A Quick Safari Through the MPSoC Run-Time Management Jungle , 2007, 2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia.

[21]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).