A study of 3D Network-on-Chip design for data parallel H.264 coding

In this paper, we study and analyze different Network-on-Chip (NoC) designs for MPEG-4/H.264 coding. The encoding and decoding processes of H.264 have been analyzed. We discuss the parallelism of H.264, and an open-source encoding program is used as a case study. The contribution of this paper lies in the NoC design method and performance evaluation of data parallel H.264 coder. It is shown in our study that the inter-thread data dependency of shared reads and writes are performance bottlenecks. Different non-uniform cache access NoC designs have been explored. Two-dimensional (2D) and three-dimensional (3D) NoCs have been analyzed in terms of hop count and heat dissipation. We present benchmark results using a cycle accurate full system simulator based on realistic workloads. Experiments show that under different workloads, the average network latencies in two 3D NoC designs are reduced up to 34% compared with the 2D NoC. It is also shown that the heat dissipation is a trade-off consideration in improving the performance of 3D IC. Our analysis and experiment results provide a guideline to design efficient 3D NoCs for data parallel H.264 coding applications.

[1]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[2]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[3]  Florian H. Seitner,et al.  Evaluation of data-parallel splitting approaches for H.264 decoding , 2008, MoMM.

[4]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[5]  Gabriella Olmo,et al.  Redundant Slice Optimal Allocation for H.264 Multiple Description Coding , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Radu Marculescu,et al.  Design space exploration and prototyping for on-chip multimedia applications , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[7]  Jörg Henkel,et al.  A case study in networks-on-chip design for embedded video , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[8]  Kanad Ghose,et al.  Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[9]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Krisztián Flautner,et al.  PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor , 2006, ASPLOS XII.

[11]  Kevin Skadron,et al.  Temperature-aware microarchitecture: Modeling and implementation , 2004, TACO.

[12]  Theodore R. Bashkow,et al.  A large scale, homogeneous, fully distributed parallel machine, I , 1977, ISCA '77.

[13]  Touradj Ebrahimi,et al.  The MPEG-4 Book , 2002 .