Compiling real-time digital signal processing applications onto multiprocessor systems

The goal of this research is to develop a set of Computer-Aided Design (CAD) tools to support the real-time implementation of Digital Signal Processing (DSP) applications onto multiple programmable processors. The work has resulted in a complete DSP design environment, called McDAS, which can compile high level DSP applications directly down to parallel code for MIMD multiprocessors. One of the major challenges of the research is the assignment and scheduling of tasks onto the processors in such a way as to maximize the throughput of the resultant implementation while considering interprocessor communication delays and resource constraints imposed by the target architecture. The scheduler in McDAS exploits pipelining, retiming, and parallel execution simultaneously, allowing the environment to efficiently support a wide range of applications with different types of concurrency. Users can invoke the scheduler with different architecture configurations to explore implementation trade-offs. The code generator is similarly retargetable to different multiprocessor architectures as well as core processors. Data buffers and synchronizations are automatically inserted to ensure correct execution. The final implementation can be used for simulation speedup or real-time processing. The results on a set of benchmarks demonstrate McDAS's ability to achieve near optimal speedups across a wide range of applications.

[1]  Shahid H. Bokhari,et al.  Partitioning Problems in Parallel, Pipelined, and Distributed Computing , 1988, IEEE Trans. Computers.

[2]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[3]  G. O'Leary,et al.  A block diagram compiler for a digital signal processing MIMD computer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Ravi Kannan,et al.  Minkowski's Convex Body Theorem and Integer Programming , 1987, Math. Oper. Res..

[5]  Iteration Ashcroft Lucid, a Nonprocedural with , 1977 .

[6]  Wang Ho Yu,et al.  Lu decomposition on a multiprocessing system with communications delay , 1984 .

[7]  John W. Backus,et al.  Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs , 1978, CACM.

[8]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[9]  Walter H. Kohler,et al.  A Preliminary Evaluation of the Critical Path Method for Scheduling Tasks on Multiprocessor Systems , 1975, IEEE Transactions on Computers.

[10]  Jan M. Rabaey,et al.  A configurable multiprocessor system for DSP behavioral simulation , 1989, IEEE International Symposium on Circuits and Systems,.

[11]  Thomas P. Barnwell,et al.  An SSIMD compiler for the implementation of linear shift-invariant flow graphs , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  H. T. Kung,et al.  Automatic Mapping Of Large Signal Processing Systems To A Parallel Machine , 1991, Optics & Photonics.

[13]  Jan Karel Lenstra,et al.  Complexity of Scheduling under Precedence Constraints , 1978, Oper. Res..

[14]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[15]  James H. McClellan,et al.  Code generation for the AT&T DSP32 , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[16]  Thomas E. Tremain,et al.  An evaluation of 4800 bps voice coders. , 1989 .

[17]  S. Biyiksiz,et al.  Multirate digital signal processing , 1985, Proceedings of the IEEE.

[18]  Miodrag Potkonjak,et al.  Fast prototyping of datapath-intensive architectures , 1991, IEEE Design & Test of Computers.

[19]  Salvatore J. Stolfo,et al.  DADO: A Parallel Processor for Expert Systems , 1984 .

[20]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[21]  David Aaron Schwartz,et al.  Synchronous multiprocessor realizations of shift-invariant flow graphs , 1985 .

[22]  H. T. Kung,et al.  Warp architecture: From prototype to production , 1899 .

[23]  David B. Skillicorn A taxonomy for computer architectures , 1988, Computer.

[24]  Donald B. Johnson,et al.  Finding All the Elementary Circuits of a Directed Graph , 1975, SIAM J. Comput..

[25]  Shreekant S. Thakkar,et al.  The Symmetry Multiprocessor System , 1988, ICPP.

[26]  T. C. Hu Parallel Sequencing and Assembly Line Problems , 1961 .

[27]  Harold S. Stone,et al.  Multiprocessor Scheduling with the Aid of Network Flow Algorithms , 1977, IEEE Transactions on Software Engineering.

[28]  Mark Stanley Papamarcos A Low-Overhead Coherence Solution for Bus-Organized Multiprocessors with Private Cache Memories , 1984 .

[29]  Carolyn McCreary,et al.  Automatic determination of grain size for efficient parallel processing , 1989, CSC '89.

[30]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[31]  Edward A. Lee,et al.  Dynamic-level scheduling for heterogeneous processor networks , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[32]  Edward A. Lee Consistency in dataflow graphs , 1991, Proceedings of the International Conference on Application Specific Array Processors.

[33]  David G. Messerschmitt,et al.  A Tool for Structured Functional Simulation , 1984, IEEE Journal on Selected Areas in Communications.

[34]  Edward A. Lee,et al.  Gabriel: a design environment for DSP , 1989, IEEE Trans. Acoust. Speech Signal Process..

[35]  Keshab K. Parhi,et al.  Rate-optimal fully-static multiprocessor scheduling of data-flow signal processing programs , 1989, IEEE International Symposium on Circuits and Systems,.

[36]  Miodrag Potkonjak,et al.  Fast implementation of recursive programs using transformations , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.