Rigel

Image processing algorithms implemented using custom hardware or FPGAs of can be orders-of-magnitude more energy efficient and performant than software. Unfortunately, converting an algorithm by hand to a hardware description language suitable for compilation on these platforms is frequently too time consuming to be practical. Recent work on hardware synthesis of high-level image processing languages demonstrated that a single-rate pipeline of stencil kernels can be synthesized into hardware with provably minimal buffering. Unfortunately, few advanced image processing or vision algorithms fit into this highly-restricted programming model. In this paper, we present Rigel, which takes pipelines specified in our new multi-rate architecture and lowers them to FPGA implementations. Our flexible multi-rate architecture supports pyramid image processing, sparse computations, and space-time implementation tradeoffs. We demonstrate depth from stereo, Lucas-Kanade, the SIFT descriptor, and a Gaussian pyramid running on two FPGA boards. Our system can synthesize hardware for FPGAs with up to 436 Megapixels/second throughput, and up to 297x faster runtime than a tablet-class ARM CPU.

[1]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[2]  Frédo Durand,et al.  Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..

[3]  Edward A. Lee,et al.  Multidimensional synchronous dataflow , 2002, IEEE Trans. Signal Process..

[4]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Rudy Lauwereins,et al.  Cyclo-static data flow , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[7]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[8]  Jonathan Ragan-Kelley,et al.  Automatically scheduling halide image processing pipelines , 2016, ACM Trans. Graph..

[9]  Louise H. Crockett,et al.  The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc , 2014 .

[10]  Pat Hanrahan,et al.  GRAMPS: A programming model for graphics pipelines , 2009, ACM Trans. Graph..

[11]  Charles E. Leiserson,et al.  Retiming synchronous circuitry , 1988, Algorithmica.

[12]  Marc Levoy,et al.  The Frankencamera: an experimental platform for computational photography , 2010, SIGGRAPH 2010.

[13]  Heinrich Meyr,et al.  Mapping multirate dataflow to complex RT level hardware models , 1997, Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors.

[14]  Edward A. Lee,et al.  Joint Minimization of Code and Data for Synchronous Dataflow Programs , 1997, Formal Methods Syst. Des..

[15]  Jan Vitek,et al.  Terra: a multi-stage language for high-performance computing , 2013, PLDI.

[16]  Feng Qian,et al.  A close examination of performance and power characteristics of 4G LTE networks , 2012, MobiSys '12.

[17]  Marc Levoy,et al.  The Frankencamera: an experimental platform for computational photography , 2010, ACM Trans. Graph..

[18]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[19]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[20]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[21]  Pat Hanrahan,et al.  Darkroom , 2014, ACM Trans. Graph..

[22]  Edward H. Adelson,et al.  PYRAMID METHODS IN IMAGE PROCESSING. , 1984 .