On self-timed ring for consistent mapping and maximum throughput

Multiprocessor System-on-Chip employing self-timed technique becomes increasingly attractive due to its ability for exploiting high parallelism of applications. There have been many research efforts on studying self-timed techniques on hardware layer. However, these research results are unable to be applied to system synthesis; in particular, how to correctly and optimally map an application represented by a Data Flow Graph to a self-timed ring architecture remains unknown. Self-timed ring (STR) is a popular and easy to implemented architecture. This paper establishes a series of theorems about the setting of initial configuration to achieve correct mappings and the formulas of calculating corresponding throughputs of STR. Based on the understanding, we can obtain a correct initial configuration of STR. And an algorithm presented in the paper can also find the best initial configuration that achieves the maximum throughput of STR. Examples show maximum throughput algorithm achieves 51.11% improvement of throughput compared with non-optimized ones.

[1]  Pasi Liljeberg,et al.  Self-timed ring architecture for SOC applications , 2003, IEEE International [Systems-on-Chip] SOC Conference, 2003. Proceedings..

[2]  Meikang Qiu,et al.  Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory , 2010, J. Signal Process. Syst..

[3]  Makoto Iwata,et al.  DDMPs: self-timed super-pipelined data-driven multimedia processors , 1999 .

[4]  Rob Payne,et al.  Self-Timed FPGA Systems , 1995, FPL.

[5]  Wei-Che Tseng,et al.  Write activity reduction on non-volatile main memories for embedded chip multiprocessors , 2013, TECS.

[6]  Makoto Iwata,et al.  A macroscopic behavior model for self-timed pipeline systems , 2003, Seventeenth Workshop on Parallel and Distributed Simulation, 2003. (PADS 2003). Proceedings..

[7]  Montek Singh,et al.  Loop pipelining for high-throughput stream computation using self-timed rings , 2006, ICCAD.

[8]  Mark R. Greenstreet,et al.  Self-timed meshes are faster than synchronous , 1997, Proceedings Third International Symposium on Advanced Research in Asynchronous Circuits and Systems.

[9]  Edwin Hsing-Mean Sha,et al.  Scheduling Data-Flow Graphs via Retiming and Unfolding , 1997, IEEE Trans. Parallel Distributed Syst..

[10]  Ivan E. Sutherland,et al.  Micropipelines , 1989, Commun. ACM.

[11]  Mark R. Greenstreet,et al.  Temporal Properties of Self-Timed Rings , 2001, CHARME.

[12]  Meikang Qiu,et al.  Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems , 2009, TODE.

[13]  Kenneth Steiglitz,et al.  Bubbles can make self-timed pipelines fast , 1990, J. VLSI Signal Process..

[14]  Mark Russell Greenstreet,et al.  Stari: a technique for high-bandwidth communication , 1993 .

[15]  Edwin Hsing-Mean Sha,et al.  Efficient Loop Scheduling for Chip Multiprocessors with Non-Volatile Main Memory , 2013, J. Signal Process. Syst..

[16]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[17]  Pasi Liljeberg,et al.  Self-Timed Approach for Noise Reduction in NoC Reduction in NoC , 2005 .

[18]  Meikang Qiu,et al.  Design optimization and space minimization considering timing and code size via retiming and unfolding , 2006, Microprocess. Microsystems.

[19]  ChaoLiang-Fang,et al.  Scheduling Data-Flow Graphs via Retiming and Unfolding , 1997 .

[20]  Kai-ming Yang,et al.  Design of an asynchronous ring bus architecture for multi-core systems , 2010, 2010 International Computer Symposium (ICS2010).

[21]  Edwin Hsing-Mean Sha,et al.  Code size reduction technique and implementation for software-pipelined DSP applications , 2003, TECS.

[22]  Wei-Che Tseng,et al.  Minimizing Access Cost for Multiple Types of Memory Units in Embedded Systems Through Data Allocation and Scheduling , 2012, IEEE Transactions on Signal Processing.

[23]  Ted Eugene Williams,et al.  Self-timed rings and their application to division , 1992 .

[24]  Edwin Hsing-Mean Sha,et al.  Efficient task assignment and scheduling for MPSoC DSPS with VS-SPM considering concurrent accesses through data allocation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Steven M. Nowick,et al.  Applications of asynchronous circuits , 1999, Proc. IEEE.