Scalable communication architectures for massively parallel hardware multi-processors

Modern complex embedded applications in multiple application fields impose stringent and continuously increasing functional and parametric demands. To adequately serve these applications, massively parallel multi-processor systems on a single chip (MPSoCs) are required. This paper is devoted to the design of scalable communication architectures of massively parallel hardware multi-processors for highly-demanding applications. We demonstrated that in the massively parallel hardware multi-processors the communication network influence on both the throughput and circuit area dominates the processors influence, while the traditionally used flat communication architectures do not scale well with the increase of parallelism. Therefore, we propose to design highly optimized application-specific partitioned hierarchical organizations of the communication architectures through exploiting the regularity and hierarchy of the actual information flows of a given application. We developed related communication architecture synthesis strategies and incorporated them into our quality-driven model-based multi-processor design methodology and related automated architecture exploration framework. Using this framework we performed a large series of architecture synthesis experiments. Some of the results of the experiments are presented in this paper. They demonstrate many features of the synthesized communication architectures and show that our method and related framework are able to efficiently synthesize well scalable communication architectures even for the high-end massively parallel multi-processors that have to satisfy extremely stringent computation demands.

[1]  A. Burg,et al.  Configurable high-throughput decoder architecture for quasi-cyclic LDPC codes , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[2]  Scott A. Mahlke,et al.  High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[3]  Krzysztof Kuchcinski,et al.  Global approach to assignment and scheduling of complex behaviors based on HCDG and constraint programming , 2003, J. Syst. Archit..

[4]  Fabrizio Petrini,et al.  Cell Multiprocessor Communication Network: Built for Speed , 2006, IEEE Micro.

[5]  Marek Tudruj,et al.  Communication on the Fly for Hierarchical Systems of Chip Multi-processors , 2011, 2011 Sixth International Symposium on Parallel Computing in Electrical Engineering.

[6]  Jianwen Zhang,et al.  GNLS: a hybrid on-chip communication architecture for SoC designs , 2011, Int. J. High Perform. Syst. Archit..

[7]  Pascal Urard,et al.  A 135Mbps DVB-S2 compliant codec based on 64800-bit LDPC and BCH codes (ISSCC Paper 24.3) , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[8]  Kees A. Vissers,et al.  Optimized generation of data-path from C codes for FPGAs , 2005, Design, Automation and Test in Europe.

[9]  P. Urard,et al.  A 135Mb/s DVB-S2 compliant codec based on 64800b LDPC and BCH codes , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[10]  Joseph R. Cavallaro,et al.  Multi-Rate High-Throughput LDPC Decoder: Tradeoff Analysis Between Decoding Throughput and Area , 2006, 2006 IEEE 17th International Symposium on Personal, Indoor and Mobile Radio Communications.

[11]  Gerald E. Sobelman,et al.  Flexible LDPC decoder architecture for high-throughput applications , 2008, APCCAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems.

[12]  Lech Józwiak,et al.  Quality-driven design in the system-on-a-chip era: Why and how? , 2001, J. Syst. Archit..

[13]  Jason Cong,et al.  High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  Radu Marculescu,et al.  On-chip communication architecture exploration: A quantitative evaluation of point-to-point, bus, and network-on-chip approaches , 2007, TODE.

[15]  Luca Fanucci,et al.  A minimum-latency block-serial architecture of a decoder for IEEE 802.11n LDPC codes , 2007, 2007 IFIP International Conference on Very Large Scale Integration.

[16]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[17]  Gwan S. Choi,et al.  Multi-Rate Layered Decoder Architecture for Block LDPC Codes of the IEEE 802.11n Wireless Standard , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[18]  Stephen Neuendorffer,et al.  FPGA Pipeline Synthesis Design Exploration Using Module Selection and Resource Sharing , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Luca Fanucci,et al.  A multi-processor NoC-based architecture for real-time image/video enhancement , 2011, Journal of Real-Time Image Processing.

[20]  Vishwas Sundaramurthy,et al.  Pipelined Block-Serial Decoder Architecture for Structured Ldpc Codes , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[21]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[22]  David J. C. MacKay,et al.  Good Error-Correcting Codes Based on Very Sparse Matrices , 1997, IEEE Trans. Inf. Theory.

[23]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[24]  Saraju P. Mohanty,et al.  Low-Power High-Level Synthesis for Nanoscale CMOS Circuits , 2008 .

[25]  Joseph R. Cavallaro,et al.  Multi-layer parallel decoding algorithm and vlsi architecture for quasi-cyclic LDPC codes , 2011, 2011 IEEE International Symposium of Circuits and Systems (ISCAS).

[26]  Nikil D. Dutt,et al.  SPARK: a high-level synthesis framework for applying parallelizing compiler transformations , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[27]  Leonel Sousa,et al.  Massively LDPC Decoding on Multicore Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.

[28]  Markus Rupp,et al.  Efficient DSP implementation of an LDPC decoder , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Lech Józwiak,et al.  Quality-driven methodology for demanding accelerator design , 2010, 2010 11th International Symposium on Quality Electronic Design (ISQED).

[30]  Nadia Nedjah,et al.  Modern development methods and tools for embedded reconfigurable systems: A survey , 2010, Integr..