Performance Estimation of Pipelined MultiProcessor System-on-Chips (MPSoCs)

The paradigm of pipelined MPSoC (processors connected in a pipeline) is well suited to data flow nature of multimedia applications. Often design space exploration is performed to optimize execution time, latency or throughput of a pipelined MPSoC where the variants in the system are processor configurations due to customizable options in each of the processors. Since there can be billions of combinations of processor configurations (design points), the challenge is to quickly provide estimates of performance metrics of those design points. Hence, in this article, we propose analytical models to estimate execution time, latency and throughput of a pipelined MPSoC's design points, avoiding slow full-system cycle accurate simulations of all the design points. For effective use of these analytical models, latencies of individual processor configurations should be available. We propose two estimation methods (PS and PSP) to quickly gather latencies of processor configurations with reduced number of simulations. The PS method simulates all the processor configurations once, while the PSP method simulates only a subset of processor configurations and then uses a processor analytical model to estimate the latencies of the remaining processor configurations. We experimented with several pipelined MPSoCs executing typical multimedia applications (JPEG encoder/decoder, MP3 encoder and H.264 encoder). Our results show that the analytical models with PS and PSP methods had maximum absolute error of 12.95 percent and 18.67 percent respectively, and minimum fidelity of 0.93 and 0.88 respectively. The design spaces of the pipelined MPSoCs ranged from 1012 to 1018 design points, and hence simulation of all design points will take years and is infeasible. Compared to PS method, the PSP method reduced simulation time from days to several hours.

[1]  Bernd Burgstaller,et al.  Orchestration by approximation: mapping stream programs onto multicore architectures , 2011, ASPLOS XVI.

[2]  Christian Poellabauer,et al.  Monitoring of cache miss rates for accurate dynamic voltage and frequency scaling , 2005, IS&T/SPIE Electronic Imaging.

[3]  Sander Stuijk,et al.  Throughput Analysis of Synchronous Data Flow Graphs , 2006, Sixth International Conference on Application of Concurrency to System Design (ACSD'06).

[4]  Sri Parameswaran,et al.  DEW: A fast level 1 cache simulation approach for embedded processors with FIFO replacement policy , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[5]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[6]  Luca Benini,et al.  A Feedback-Based Approach to DVFS in Data-Flow Applications , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Sri Parameswaran,et al.  A design flow for application specific heterogeneous pipelined multiprocessor systems , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[8]  Xin He,et al.  Optimal synthesis of latency and throughput constrained pipelined MPSoCs targeting streaming applications , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[9]  Muhammad Shafique,et al.  System-level application-aware dynamic power management in adaptive pipelined MPSoCs for multimedia , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Liang-Gee Chen,et al.  Hardware architecture design of an H.264/AVC video codec , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[11]  Sri Parameswaran,et al.  Fidelity metrics for estimation models , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[12]  Muhammad Shafique,et al.  Low-power adaptive pipelined MPSoCs for multimedia: An H.264 video encoder case study , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  IEEE Transactions on Parallel and Distributed Systems, Vol. 13 , 2002 .

[14]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[15]  Henk Corporaal,et al.  Design of heterogenous multi-processor embedded systems: applying functional pipelining , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[16]  Sri Parameswaran,et al.  Rapid runtime estimation methods for pipelined MPSoCs , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[17]  Sri Parameswaran,et al.  Heterogeneous multiprocessor implementations for JPEG:: a case study , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[18]  Luca Benini,et al.  A control theoretic approach to energy-efficient pipelined computation in MPSoCs , 2007, TECS.

[19]  Kapil Vaswani,et al.  Construction and use of linear regression models for processor performance analysis , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[20]  Jeanine Cook,et al.  Performance modeling using Monte Carlo simulation , 2006, IEEE Computer Architecture Letters.

[21]  Sri Parameswaran,et al.  Rapid Design Space Exploration of Application Specific Heterogeneous Pipelined Multiprocessor Systems , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Sri Parameswaran,et al.  Multi-ASIP based parallel and scalable implementation of motion estimation kernel for high definition videos , 2011, 2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia.

[23]  Sri Parameswaran,et al.  Design Methodology for Pipelined Heterogeneous Multiprocessor System , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[24]  Sander Stuijk,et al.  Parametric Throughput Analysis of Synchronous Data Flow Graphs , 2008, 2008 Design, Automation and Test in Europe.

[25]  Sri Parameswaran,et al.  Synthesis of heterogeneous pipelined multiprocessor systems using ILP: jpeg case study , 2008, CODES+ISSS '08.