A kernel interleaved scheduling method for streaming applications on soft-core vector processors

Massively parallel networks of highly efficient, high performance Single Instruction Multiple Data (SIMD) processors have been shown to enable FPGA-based implementation of real-time signal processing applications with performance and cost comparable to dedicated hardware architectures. This is achieved by exploiting simple datapath units with deep processing pipelines. However, these architectures are highly susceptible to pipeline bubbles resulting from data and control hazards; the only way to mitigate against these is manual interleaving of application tasks on each datapath, since no suitable automated interleaving approach exists. In this paper we describe a new automated integrated mapping/scheduling approach to map algorithm tasks to processors and a new low-complexity list scheduling technique to generate the interleaved schedules. When applied to a spatial Fixed-Complexity Sphere Decoding (FSD) detector for next-generation Multiple-Input Multiple-Output (MIMO) systems, the resulting schedules achieve real-time performance for IEEE 802.11n systems on a network of 16-way SIMD processors on FPGA, enable better performance/complexity balance than current approaches and produce results comparable to handcrafted implementations.

[1]  Jan M. Rabaey,et al.  Scheduling of DSP programs onto multiprocessors for maximum throughput , 1993, IEEE Trans. Signal Process..

[2]  Rüdiger L. Urbanke,et al.  The capacity of low-density parity-check codes under message-passing decoding , 2001, IEEE Trans. Inf. Theory.

[3]  Paolo Faraboschi,et al.  Instruction scheduling for instruction level parallel processors , 2001, Proc. IEEE.

[4]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[5]  Roger F. Woods,et al.  A Pipeline Interleaved Heterogeneous SIMD Soft Processor Array Architecture for MIMO-OFDM Detection , 2011, ARC.

[6]  John McAllister,et al.  FPGA based soft-core SIMD processing: A MIMO-OFDM Fixed-Complexity Sphere Decoder case study , 2010, 2010 International Conference on Field-Programmable Technology.

[7]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems , 2007, Wiley series on parallel and distributed computing.

[8]  Rajeev Motwani,et al.  Optimization Algorithms for Pipelined Parallelism , 1994 .

[9]  John S. Thompson,et al.  Fixing the Complexity of the Sphere Decoder for MIMO Detection , 2008, IEEE Transactions on Wireless Communications.

[10]  Stephen A. Edwards,et al.  A disruptive computer design idea: Architectures with repeatable timing , 2009, 2009 IEEE International Conference on Computer Design.

[11]  Edward A. Lee,et al.  Joint Minimization of Code and Data for Synchronous Dataflow Programs , 1997, Formal Methods Syst. Des..

[12]  W. Dally,et al.  Stream Scheduling , 2001 .

[13]  Edward A. Lee,et al.  Declustering: A New Multiprocessor Scheduling Technique , 1993, IEEE Trans. Parallel Distributed Syst..

[14]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[15]  Guy Lemieux,et al.  Vector Processing as a Soft Processor Accelerator , 2009, TRETS.