FlexWAFE - an architecture for reconfigurable image processing systems (FlexWAFE - eine Architektur für rekonfigurierbare Bildverarbeitungssysteme)

Kurzlich gab es eine Zunahme der Nachfrage nach hochauflosenden digitalen Medieninhalten in den Kino- und Fernsehenindustrien. Derzeit vorhandene Systeme entsprechen nicht den Anforderungen, oder sind zu teuer. Neue Hardware-Systeme und neuer Programmiertechniken sind erforderlich, um den hochauflosenden, hochwertigen, Bildanforderungen zu genugen und Kosten zu verringern. Die Industrie sucht eine flexible Architektur zur Ausfuhrung mehrerer Anwendungen auf Standard-Komponenten, mit reduzierten Entwicklungszeiten. Bis jetzt ist gangige Praxis, spezialisierten Architektur und Systeme zu entwickeln, die eine einzelne Anwendung zielen. Dieses hat wenig Flexibilitat und fuhrt zu hohe Entwicklungskosten, jede neue Anwendung ist fast von Grund auf neu konzipiert. Unser Fokus war es, eine fur Bild Verarbeitung geeignet Architektur zu entwickeln dass die Flexibilitat hat mehrere Anwendungen an dieselbe FPGA-basierte Hardware-Plattform zu laufen. Die Neuheit in unserem Ansatz ist, dass wir Teile der Architektur zur Laufzeit rekonfigurieren, aber, ohne das Zeit und constraints strafe von FPGA Partielle-Rekonfiguration-Techniken. Die Architektur verwendet eine hierarchische Kontrollstruktur, die zur parallel Verarbeitung gut geeignet ist, und Single-Cycle-Latenz Rekonfiguration von Teilen der Verarbeitungs-Pipeline ermoglicht. Dieses wird unter Verwendung relativ weniger Ressourcen fur die verteiltes Steuerung Strukturen erzielt. Um das entwickelte Architektur zu testen ein komplexer Film-Korn-Rauschunterdruckung Algorithmus wurde auf einer von Thomson-Grass Valley entwickelt standard Hardware-Plattform umgesetzt. Das System erfullt alle Anforderungen und hatte sehr wenig Last auf den hierarchischen Kontrollstrukturen, es gibt viel Wachstum Spielraum fur viel kompliziertere Steuerunganforderungen. Die Architektur ist zu anderen Hardwareplattformen portiert worden, und andere Anwendungen wurden ebenfalls implementiert. Der Laufzeitreconfigurability ist ein Schlusselfaktor im Erfolg des FlexWAFE gewesen.

[1]  David L. Bean,et al.  A programmable processor with 4096 processing units for media applications , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Luca Benini,et al.  A Constraint Programming Approach for Allocation and Scheduling on the CELL Broadband Engine , 2008, CP.

[3]  C.E. Shannon,et al.  Communication in the Presence of Noise , 1949, Proceedings of the IRE.

[4]  Martin D. F. Wong,et al.  Efficient ASIP design for configurable processors with fine-grained resource sharing , 2008, FPGA '08.

[5]  Henning Sahlbach,et al.  Mapping of a film grain removal algorithm to a heterogeneous reconfigurable architecture , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[6]  Rolf Ernst,et al.  Traffic shaping for an FPGA based SDRAM controller with complex QoS requirements , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[7]  N. Voros,et al.  Dynamic System Reconfiguration in Heterogeneous Platforms , 2009 .

[8]  Rolf Ernst,et al.  FlexWAFE - A High-end Real-Time Stream Processing Library for FPGAs , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[9]  Rolf Ernst,et al.  A High-End Real-Time Digital Film Processing Reconfigurable Platform , 2007, EURASIP J. Embed. Syst..

[10]  Uwe Wessely,et al.  Motion Compensated Spatial-Temporal Reduction of Film Grain Noise in the Wavelet Domain , 2005 .

[11]  Rolf Ernst,et al.  A reconfigurable HW/SW platform for computation intensive high-resolution real-time digital film applications , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[12]  J. Jacobs,et al.  High Volume Colour Image Processing with Massively Parallel Embedded Processors , 2005, PARCO.

[13]  P. Lieverse,et al.  A clustering approach to explore grain-sizes in the definition of weakly programmable processing elements , 1997, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing.

[14]  Stephen Booth,et al.  High-Performance Reconfigurable Computing - the View from Edinburgh , 2007, AHS.

[15]  Yan Wang,et al.  Biorthogonal wavelets in image compression , 2012, 2012 Third International Conference on Intelligent Control and Information Processing.

[16]  H. T. Kung,et al.  Systolic Arrays for (VLSI). , 1978 .

[17]  Hugo De Man,et al.  Cathedral-III : architecture-driven high-level synthesis for high throughput DSP applications , 1991, 28th ACM/IEEE Design Automation Conference.

[18]  Sven Heithecker Communication and memory scheduling in reconfigurable image processing systems , 2009 .

[19]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[20]  Satyabrata Rout Orthogonal vs. Biorthogonal Wavelets for Image Compression , 2003 .

[21]  Gregory Butler,et al.  The Lord of the Rings: the visual effects that brought middle earth to the screen , 2004, SIGGRAPH '04.

[22]  Reiner W. Hartenstein,et al.  MOM-Map Oriented Machine , 1987 .

[23]  Rolf Ernst,et al.  FlexFilm - an Image Processor for Digital Film Processing , 2006, Dynamically Reconfigurable Architectures.

[24]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[25]  Constantinos E. Goutis,et al.  Evaluation of design alternatives for the 2-D-discrete wavelet transform , 2001, IEEE Trans. Circuits Syst. Video Technol..

[26]  Rolf Ernst,et al.  An image processor for digital film , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[27]  White Paper FPGA Coprocessing Evolution : Sustained Performance Approaches Peak Performance , 1998 .

[28]  Maya Gokhale,et al.  Trident: From High-Level Language to Hardware Circuitry , 2007, Computer.

[29]  Hsueh-Ming Hang,et al.  A comparison of block-matching algorithms mapped to systolic-array implementation , 1997, IEEE Trans. Circuits Syst. Video Technol..

[30]  Pat Conway,et al.  The AMD Opteron Northbridge Architecture , 2007, IEEE Micro.

[31]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[32]  First the Tick , Now the Tock : Intel ® Microarchitecture ( Nehalem ) Introducing a New Dynamically and Design-Scalable Microarchitecture , 2009 .

[33]  R. Ernst,et al.  A mixed QoS SDRAM controller for FPGA-based high-end image processing , 2003, 2003 IEEE Workshop on Signal Processing Systems (IEEE Cat. No.03TH8682).

[34]  M.O. Arias-Estrada,et al.  A real-time FPGA-based architecture for optical flow computation , 2003, 2003 IEEE International Workshop on Computer Architectures for Machine Perception.

[35]  Daniel Thalmann,et al.  Crowd and group animation , 2005, SIGGRAPH Courses.

[36]  Rolf Ernst,et al.  Application development with the FlexWAFE real-time stream processing architecture for FPGAs , 2009, TECS.

[37]  Da Qi Ren,et al.  Algorithm level power efficiency optimization for CPU-GPU processing element in data intensive SIMD/SPMD computing , 2011, J. Parallel Distributed Comput..

[38]  A. Winship Interest. , 1893 .

[39]  Koichi Awazu,et al.  A real-time software platform for the Cell processor , 2005, IEEE Micro.

[40]  C. Brislawn Classification of Nonexpansive Symmetric Extension Transforms for Multirate Filter Banks , 1996 .

[41]  Huiyang Zhou,et al.  Accelerating MATLAB Image Processing Toolbox functions on GPUs , 2010, GPGPU-3.

[42]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[43]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[44]  Rolf Ernst,et al.  A reconfigurable hardware platform for digital real-time signal processing in television studios , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[45]  Liang-Gee Chen,et al.  Survey on Block Matching Motion Estimation Algorithms and Architectures with New Results , 2006, J. VLSI Signal Process..

[46]  Santanu Dutta,et al.  Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems , 2001, IEEE Des. Test Comput..

[47]  Rashid Iqbal Hardware bidirectional real time motion estimator on a Xilinx Virtex II Pro FPGA , 2006 .

[48]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[49]  Juan M. Meneses,et al.  VLSI architecture for motion estimation using the block-matching algorithm , 1996, Proceedings ED&TC European Design and Test Conference.

[50]  Ben Cohen VHDL Coding Styles and Methodologies , 1995 .

[51]  I. Xilinx,et al.  Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete data sheet , 2004 .