Design of Image Processing Embedded Systems Using Multidimensional Data Flow

Introduction.- Design of Image Processing Applications.- Fundamentals and Related Work.- Electronic System Level Design with SystemCoDesigner.- Windowed Data Flow.- Memory Mapping Functions for Efficient Implementation of WDF Edges.- Buffer Analysis for Complete Application Graphs.- Multidimensional Communication Synthesis.- Conclusion.

[1]  Jürgen Teich,et al.  Optimized software synthesis for digital signal processing algorithms: an evolutionary approach , 1998, 1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374).

[2]  Praveen K. Murthy,et al.  Shared buffer implementations of signal processing systems usinglifetime analysis techniques , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[3]  Ranga Vemuri,et al.  Hierarchical memory mapping during synthesis in FPGA-based reconfigurable computers , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[4]  Ed F. Deprettere,et al.  Multi-processor system design with ESPAM , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[5]  Ed F. Deprettere,et al.  A framework for rapid system-level exploration, synthesis, and programming of multimedia MP-SoCs , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[6]  Ranga Vemuri,et al.  Hierarchical memory synthesis in reconfigurable computers , 2002 .

[7]  Raymond Reiter,et al.  Scheduling Parallel Computations , 1968, J. ACM.

[8]  Gerda Janssens,et al.  Storage Size Reduction by In-place Mapping of Arrays , 2002, VMCAI.

[9]  Francky Catthoor,et al.  Bit-Width Constrained Memory Hierarchy Optimization for Real-Time Video Systems , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Jean-Luc Dekeyser,et al.  An Open Framework for Detailed Hardware Modeling , 2007, 2007 International Symposium on Industrial Embedded Systems.

[11]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[12]  Edward A. Lee,et al.  On the optimal blocking factor for blocked, non-overlapped multiprocessor schedules , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[13]  Shuvra S. Bhattacharyya,et al.  Computer Vision on FPGAs: Design Methodology and its Application to Gesture Recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[14]  Soonhoi Ha,et al.  Hardware synthesis from SPDF representation for multimedia applications , 2000, Proceedings 13th International Symposium on System Synthesis.

[15]  Jürgen Teich,et al.  FunState —an internal design representation for codesign , 1999, ICCAD 1999.

[16]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[17]  Oskar Mencer,et al.  ASC: a stream compiler for computing with FPGAs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Alexandru Turjan,et al.  Realizations of the Extended Linearization Model , 2002 .

[19]  Mainak Sen,et al.  Model-based Hardware Design for Image Processing Systems , 2006 .

[20]  Alexandru Turjan,et al.  Translating affine nested-loop programs to process networks , 2004, CASES '04.

[21]  S. Neema,et al.  Development Environment for Dynamically Reconfigurable Embedded Systems , 1999 .

[22]  Edward A. Lee,et al.  Multidimensional synchronous dataflow , 2002, IEEE Trans. Signal Process..

[23]  Pierre Boulet,et al.  Gaspard2: from MARTE to SystemC Simulation , 2008 .

[24]  Pedro C. Diniz,et al.  Compiler-generated communication for pipelined FPGA applications , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[25]  Leif Olsson,et al.  Optimization of memory allocation for real-time video processing on FPGA , 2005, 16th IEEE International Workshop on Rapid System Prototyping (RSP'05).

[26]  Martin Lukasiewycz,et al.  Improving system level design space exploration by incorporating SAT-solvers into multi-objective evolutionary algorithms , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

[27]  Stuart Sutherland,et al.  Systemverilog for Design: A Guide to Using Systemverilog for Hardware Design and Modeling , 2006 .

[28]  Jin Li,et al.  Image Compression-the Mathematics of JPEG 2000 , 2002 .

[29]  Sharad Malik,et al.  Exact memory size estimation for array computations , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[30]  Christian Haubelt,et al.  Classification of General Data Flow Actors into Known Models of Computation , 2008, 2008 6th ACM/IEEE International Conference on Formal Methods and Models for Co-Design.

[31]  Guang R. Gao,et al.  A novel framework of register allocation for software pipelining , 1993, POPL '93.

[32]  Edward A. Lee,et al.  Concurrent models of computation for embedded software , 2005 .

[33]  Alexandru Turjan,et al.  Solving Out-of-Order Communication in Kahn Process Networks , 2002, J. VLSI Signal Process..

[34]  Jong Won Park An Efficient Buffer Memory System for Subarray Access , 2001, IEEE Trans. Parallel Distributed Syst..

[35]  William Thies,et al.  Phased Computation Graphs in the Polyhedral Model , 2002 .

[36]  Hugo De Man,et al.  Power exploration for data dominated video applications , 1996, ISLPED '96.

[37]  Benny Thörnberg,et al.  A comparison between local and global memory allocation for FPGA implementation of real-time video processing systems , 2004 .

[38]  Mary W. Hall,et al.  Custom data layout for memory parallelism , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[39]  Scott A. Mahlke,et al.  PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..

[40]  Robert Rinker,et al.  An automated process for compiling dataflow graphs into reconfigurable hardware , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[41]  Ed F. Deprettere,et al.  Systematic and Automated Multiprocessor System Design, Programming, and Implementation , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[42]  Alexandru Turjan,et al.  Solving Out of Order communication using CAM memory ; an implementation , 2002 .

[43]  Paul Feautrier,et al.  Storage management in parallel programs , 1997, PDP.

[44]  Jack Jean,et al.  Data Buffering and Allocation in Mapping Generalized Template Matching on Reconfigurable Systems , 2004, The Journal of Supercomputing.

[45]  Chandra Tan,et al.  Automatic Mapping of Khoros-based Applications to Adaptive Computing Systems , 1999 .

[46]  Todor Stefanov,et al.  pn: A Tool for Improved Derivation of Process Networks , 2007, EURASIP J. Embed. Syst..

[47]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing, 2nd Edition , 1999 .

[48]  Bruce F. Cockburn,et al.  Efficient architectures for 1-D and 2-D lifting-based wavelet transforms , 2004, IEEE Transactions on Signal Processing.

[49]  Benny Thörnberg,et al.  Polyhedral space generation and memory estimation from interface and memory models of real-time video systems , 2006, J. Syst. Softw..

[50]  Pedro C. Diniz,et al.  Synthesis of pipelined memory access controllers for streamed data applications on FPGA-based computing engines , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[51]  Shuvra S. Bhattacharyya,et al.  Modeling image processing systems with homogeneous parameterized dataflow graphs , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[52]  Sander Stuijk,et al.  Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[53]  Ed F. Deprettere,et al.  Increasing Pipelined IP Core Utilization in Process Networks Using Exploration , 2004, FPL.

[54]  M. Leeser,et al.  Optimizing data intensive window-based image processing on reconfigurable hardware boards , 2005, IEEE Workshop on Signal Processing Systems Design and Implementation, 2005..

[55]  Shuvra S. Bhattacharyya,et al.  Dataflow-Based Mapping of Computer Vision Algorithms onto FPGAs , 2007, EURASIP J. Embed. Syst..

[56]  M. S. Moore Model-integrated program synthesis for real-time image processing , 1997 .

[57]  Rudy Lauwereins,et al.  Cyclo-dynamic dataflow , 1996, Proceedings of 4th Euromicro Workshop on Parallel and Distributed Processing.

[58]  Alexandru Turjan,et al.  System design using Khan process networks: the Compaan/Laura approach , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[59]  Luc Vincent,et al.  Morphological grayscale reconstruction in image analysis: applications and efficient algorithms , 1993, IEEE Trans. Image Process..

[60]  Soonhoi Ha,et al.  Memory-Optimized Software Synthesis from Dataflow Program Graphs with Large Size Data Samples , 2003, EURASIP J. Adv. Signal Process..

[61]  Bernd Kleinjohann,et al.  CV-SDF - a model for real-time computer vision applications , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[62]  Florin Balasa,et al.  Exact Computation of Storage Requirements for Multi-Dimensional Signal Processing Applications , 2006 .

[63]  Iain E. G. Richardson,et al.  H.264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia , 2003 .

[64]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[65]  Soonhoi Ha,et al.  Extended Synchronous Dataflow for Efficient DSP System Prototyping , 2002, Des. Autom. Embed. Syst..

[66]  Najeem Lawal,et al.  Automatic Generation of Spatial and Temporal Memory Architectures for Embedded Video Processing Systems , 2007, EURASIP J. Embed. Syst..

[67]  Vincent Lefebvre Restructuration automatique des variables d'un programme en vue de sa parallélisation , 1998 .

[68]  Ching-Che Chung,et al.  Design of a 125/spl mu/W, fully-scalable MPEG-2 and H.264/AVC video decoder for mobile applications , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[69]  Benny Thörnberg,et al.  Conceptual interface and memory-modeling for real-time image processing systems , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[70]  Mary W. Hall,et al.  Increasing the Applicability of Scalar Replacement , 2004, CC.

[71]  Giovanni De Micheli,et al.  Synthesis of hardware models in C with pointers and complex data structures , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[72]  Hoeseok Yang,et al.  Buffer Minimization In RTL Synthesis From Coarse-grained Dataflow Specification , 2006 .

[73]  Frédéric Vivien,et al.  A unified framework for schedule and storage optimization , 2001, PLDI '01.

[74]  Christian Haubelt,et al.  Task-accurate performance modeling in SystemC for real-time multi-processor architectures , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[75]  Ed F. Deprettere,et al.  Compilation from Matlab to Process Networks , 1999 .

[76]  Gerard J. M. Smit,et al.  Efficient Computation of Buffer Capacities for Cyclo-Static Real-Time Systems with Back-Pressure , 2006, 13th IEEE Real Time and Embedded Technology and Applications Symposium (RTAS'07).

[77]  Fan Zhang,et al.  Nonlinear Diffusion in Laplacian Pyramid Domain for Ultrasonic Speckle Reduction , 2007, IEEE Transactions on Medical Imaging.

[78]  Jürgen Teich,et al.  Buffer Memory Optimization in DSP Applications - An Evolutionary Approach , 1998, PPSN.

[79]  Sandeep Neema,et al.  Dynamically Reconfigurable Embedded Image Processing System , 1999 .

[80]  Martin Lukasiewycz,et al.  Symbolic voter placement for dependability-aware system synthesis , 2008, CODES+ISSS '08.

[81]  Mahmut T. Kandemir,et al.  Reducing memory requirements of nested loops for embedded systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[82]  Russell Tessier,et al.  Power-Efficient RAM Mapping Algorithms for FPGA Embedded Memory Blocks , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[83]  Andy D. Pimentel,et al.  On the Calibration of Abstract Performance Models for System-level Design Space Exploration , 2006, ICSAMOS.

[84]  Soonhoi Ha,et al.  Fractional rate dataflow model and efficient code synthesis for multimedia applications , 2002, LCTES/SCOPES '02.

[85]  Wayne Luk,et al.  Memory access optimisation for reconfigurable systems , 2001 .

[86]  Pedro C. Diniz,et al.  Partial Data Reuse for Windowing Computations: Performance Modeling for FPGA Implementations , 2007, ARC.

[87]  Ed F. Deprettere,et al.  Algorithmic transformation techniques for efficient exploration of alternative application instances , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[88]  Georgi Gaydadjiev,et al.  Memory Organization with Multi-Pattern Parallel Accesses , 2008, 2008 Design, Automation and Test in Europe.

[89]  Linda M. Wills,et al.  Multidimensional dataflow-based parallelization for multimedia instruction set extensions , 2006, 2006 International Conference on Parallel Processing Workshops (ICPPW'06).

[90]  Jürgen Teich,et al.  Scheduling hardware/software systems using symbolic techniques , 1999, Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450).

[91]  Ed F. Deprettere,et al.  Communication synthesis in a multiprocessor environment , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[92]  K. Wakabayashi CyberWorkBench: integrated design environment based on C-based behavior synthesis and verification , 2005, 2005 IEEE VLSI-TSA International Symposium on VLSI Design, Automation and Test, 2005. (VLSI-TSA-DAT)..

[93]  Mary W. Hall,et al.  Evaluating heuristics in automatically mapping multi-loop applications to FPGAs , 2005, FPGA '05.

[94]  Stephen A. Edwards,et al.  FIFO Sizing for High-Performance Pipelines , 2007 .

[95]  Soonhoi Ha,et al.  A dataflow specification for system level synthesis of 3D graphics applications , 2001, ASP-DAC '01.

[96]  Ranga Vemuri,et al.  Global memory mapping for FPGA-based reconfigurable systems , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[97]  Praveen K. Murthy,et al.  Buffer merging—a powerful technique for reducing memory requirements of synchronous dataflow specifications , 2004, TODE.

[98]  Jim Nichols,et al.  An Adaptable , Cost Effective Image Processing System , 1998 .

[99]  C. A. Petri Communication with automata , 1966 .

[100]  Gerard J. M. Smit,et al.  Efficient Computation of Buffer Capacities for Cyclo-Static Dataflow Graphs , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[101]  Bruce A. Draper,et al.  High-Level Language Abstraction for Reconfigurable Computing , 2003, Computer.

[102]  Paul Feautrier,et al.  Automatic Storage Management for Parallel Programs , 1998, Parallel Comput..

[103]  E.A. Lee,et al.  A comparison of synchronous and cycle-static dataflow , 1995, Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers.

[104]  Stephen A. Edwards,et al.  Static Deadlock Detection for the SHIM Concurrent Language , 2008, 2008 6th ACM/IEEE International Conference on Formal Methods and Models for Co-Design.

[105]  Philip R. Moorby,et al.  The Verilog Hardware Description Language, 5th Edition , 2002 .

[106]  Miriam Leeser,et al.  Automatic Sliding Window Operation Optimization for FPGA-Based , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.