High level synthesis for non-manifest digital signal processing applications

In this thesis we show the feasibility of Coarse Grained Data Flow Machines for high-throughput streaming non-manifest applications. The architecture of the Coarse Grained Data Flow Machine is derived from the classical data flow architecture and the scheduling of its processing elements is done dynamically in hardware. Since the implementation of such an architecture is strongly application dependent, a design flow and supporting software tools, are provided. This gives application designers the means by which the number of processing elements, buffer sizes and latencies of the architecture can be tuned.

[1]  D. Knapp,et al.  A Review of Hardware Synthesis Techniques , 1996 .

[2]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[3]  Kenneth J. Breeding Digital Design Fundamentals , 1989 .

[4]  Erkay Savas,et al.  The Montgomery Modular Inverse-Revisited , 2000, IEEE Trans. Computers.

[5]  S. Rathman,et al.  Processing the new world of interactive media , 1998 .

[6]  Daniel Kroening,et al.  A Rigorous Correctness Proof of a Tomasulo Scheduler Supporting Precise Interrupts , 1999 .

[7]  J. Kettenis,et al.  A video signal processor for motion-compensated field-rate upconversion in consumer television , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[8]  Marco Bekooij,et al.  Scheduling coarse-grain operations for VLIW processors , 2000, ISSS '00.

[9]  Elizabeth Winey Data flow architecture , 1978, ACM-SE 16.

[10]  Luca Benini,et al.  Telescopic units: a new paradigm for performance optimization of VLSI designs , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[11]  Silvia M. Müller,et al.  On the scheduling of variable latency functional units , 1999, SPAA '99.

[12]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[13]  Daniel D. Gajski,et al.  High ― Level Synthesis: Introduction to Chip and System Design , 1992 .

[14]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1998, ISCA '98.

[15]  Wfj Wim Verhaegh,et al.  Multidimensional periodic scheduling , 1995 .

[16]  Burton S. Kaliski,et al.  The Montgomery Inverse and Its Applications , 1995, IEEE Trans. Computers.

[17]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[18]  Sandro Etalle,et al.  Scheduling and Allocation of Non-Manifest Loops on Hardware Graph-Models , 2001 .

[19]  Huibert Kwakernaak,et al.  Modern signals and systems , 1991 .

[20]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[21]  Henk Corporaal Microprocessor architectures - from VLIW to TTA , 1997 .

[22]  Stamatis Vassiliadis,et al.  Register renaming and dynamic speculation: an alternative approach , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[23]  Henk Corporaal Transport Triggered Architectures : Design and Evaluation , 1995 .

[24]  Richard P. Kleihorst,et al.  Mpeg2 Video Encoding in Consumer Electronics , 1997, J. VLSI Signal Process..

[25]  E. J. Lerner Data-flow architecture: A decentralized structure based on the flow of data will permit future computers to operate at even higher speeds , 1984, IEEE Spectrum.

[26]  Donald J. Patterson,et al.  Computer organization and design: the hardware-software interface (appendix a , 1993 .

[27]  Srivaths Ravi,et al.  Integrating variable-latency components into high-level synthesis , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[28]  Ian Watson,et al.  The Manchester prototype dataflow computer , 1985, CACM.

[29]  Eric R. Zieyel Operations research : applications and algorithms , 1988 .

[30]  V. Gerald Grafe,et al.  The Epsilon-2 Multiprocessor System , 1990, J. Parallel Distributed Comput..

[31]  Thijs Krol,et al.  The synthesis of a hardware scheduler for non-manifest loops , 2002, Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools.

[32]  Emile H. L. Aarts,et al.  Multidimensional Periodic Scheduling Model and Complexity , 1996, Euro-Par, Vol. II.

[33]  W.F.J. Verhaegh,et al.  Design of a 30 MHz, 32/16/8-points DCT processor with Phideo , 1994, Proceedings of 1994 IEEE Workshop on VLSI Signal Processing.

[34]  Derek Chiou,et al.  Performance Studies of Id on the Monsoon Dataflow System , 1993, J. Parallel Distributed Comput..

[35]  Ali R. Hurson,et al.  Dataflow architectures and multithreading , 1994, Computer.

[36]  Daniel Kroening,et al.  The Impact of Hardware Scheduling Mechanismus on the Performance and Cost of Processor Designs , 1999, ARCS.

[37]  P. M. Heysters Coarse-Grained Reconfigurable Processors - Flexibility meets Efficiency , 2004 .

[38]  Thijs Krol,et al.  Minimum waste scheduling of dynamic variable-latency and non-manifest fuctional units , 2002 .