High Level Synthesis of Complex Applications: An H.264 Video Decoder

High level synthesis (HLS) is gaining wider acceptance for hardware design due to its higher productivity and better design space exploration features. In recent years, HLS techniques and design flows have also advanced significantly, and as a result, many new FPGA designs are developed with HLS. However, despite many studies using HLS, the size and complexity of such applications remain generally small, and it is not well understood how to design and optimize for HLS with large, complex reference code. Typical HLS benchmark applications contain somewhere between 100 to 1400 lines of code and about 20 sub-functions, but typical input applications may contain many times more code and functions. To study such complex applications, we present a case study using HLS for a full H.264 decoder: an application with over 6000 lines of code and over 100 functions. We share our experience on code conversion for synthesizability, various HLS optimizations, HLS limitations while dealing with complex input code, and general design insights. Through our optimization process, we achieve 34 frames/s at 640x480 resolution (480p). To enable future study and benefit the research community, we open-source our synthe- sizable H.264 implementation.

[1]  M. Thadani,et al.  ESL flow for a hardware H.264/AVC decoder using TLM-2.0 and high level synthesis: a quantitative study , 2009, Microtechnologies.

[2]  Wayne Luk,et al.  Is high level synthesis ready for business? A computational finance case study , 2014, 2014 International Conference on Field-Programmable Technology (FPT).

[3]  Mario-Alberto Ibarra-Manzano,et al.  Implementation and Test of Appearance-Based Vision Algorithms Using High-Level Synthesis in FPGA , 2011, 2011 IEEE Electronics, Robotics and Automotive Mechanics Conference.

[4]  Roman C. Kordasiewicz,et al.  Hardware implementation of the optimized transform and quantization blocks of H.264 , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[5]  High-Level Synthesis Tools for Xilinx FPGAs , 2010 .

[6]  Christopher A. Wood,et al.  High level synthesis: Where are we? A case study on matrix multiplication , 2013, 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig).

[7]  Hung-Chi Fang,et al.  Parallel 4/spl times/4 2D transform and inverse transform architecture for MPEG-4 AVC/H.264 , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[8]  Sergio Bampi,et al.  Design and FPGA prototyping of a H.264/AVC main profile decoder for HDTV , 2010, Journal of the Brazilian Computer Society.

[9]  Itu-T and Iso Iec Jtc Advanced video coding for generic audiovisual services , 2010 .

[10]  Gu-Yeon Wei,et al.  MachSuite: Benchmarks for accelerator design and customized architectures , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[11]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[12]  Chun-Chieh Lin,et al.  H.264 Decoder: A Case Study in Multiple Design Points , 2008, 2008 6th ACM/IEEE International Conference on Formal Methods and Models for Co-Design.

[13]  Ilker Hamzaoglu,et al.  An Efficient Intra Prediction Hardware Architecture for H.264 Video Decoding , 2007, 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007).

[14]  W. Najjar,et al.  A Code Refinement Methodology for Performance-Improved Synthesis from C , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[15]  Ashraf A. Kassim,et al.  A pipelined hardware implementation of in-loop deblocking filter in H.264/AVC , 2006, IEEE Transactions on Consumer Electronics.

[16]  Dirk Stroobandt,et al.  An overview of today’s high-level synthesis tools , 2012, Design Automation for Embedded Systems.

[17]  Deming Chen,et al.  High-level synthesis with behavioral level multi-cycle path analysis , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[18]  Kris Gaj,et al.  Can high-level synthesis compete against a hand-written code in the cryptographic domain? A case study , 2014, 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14).

[19]  Sandeep K. Shukla,et al.  Hardware Coprocessor Synthesis from an ANSI C Specification , 2009, IEEE Design & Test of Computers.

[20]  Wolfgang Nebel,et al.  SystemC-based Modelling, Seamless Refinement, and Synthesis of a JPEG 2000 Decoder , 2008, 2008 Design, Automation and Test in Europe.

[21]  John Evans,et al.  Overview of high level synthesis tools , 2011 .

[22]  Christophe Desmouliers,et al.  System-on-Chip Design Using High-Level Synthesis Tools , 2012 .

[23]  Shashank Dabral,et al.  Lessons and Experiences with High-Level Synthesis , 2009, IEEE Design & Test of Computers.

[24]  Daniel Gajski,et al.  C-based design flow: A case study on G.729A for Voice over internet protocol (VoIP) , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[25]  Yun Liang,et al.  High level synthesis of stereo matching: Productivity, performance, and software constraints , 2011, 2011 International Conference on Field-Programmable Technology.

[26]  Jason Cong,et al.  FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[27]  Kees A. Vissers,et al.  High-Level Synthesis Case Study: Implementation of a Memcached Server , 2014, ArXiv.

[28]  Hiroyuki Tomiyama,et al.  Proposal and Quantitative Analysis of the CHStone Benchmark Program Suite for Practical C-based High-level Synthesis , 2009, J. Inf. Process..

[29]  Bertrand Le Gal,et al.  High-level synthesis for the design of FPGA-based signal processing systems , 2009, 2009 International Symposium on Systems, Architectures, Modeling, and Simulation.

[30]  Yun Liang,et al.  High-level synthesis of multiple dependent CUDA kernels on FPGA , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[31]  Deming Chen,et al.  Fast and effective placement and routing directed high-level synthesis for FPGAs , 2014, FPGA.

[32]  Ghislain Roquier,et al.  Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study , 2008, SiPS.

[33]  Ming-Ting Sun,et al.  Accelerating Statistical LOR Estimation for a High-Resolution PET Scanner Using FPGA Devices and a High Level Synthesis Tool , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[34]  George A. Constantinides,et al.  High-level synthesis of dynamic data structures: A case study using Vivado HLS , 2013, 2013 International Conference on Field-Programmable Technology (FPT).