论文信息 - General purpose computing on graphics processing units using OpenCL - Motion detection using NVIDIA Fermi and the OpenCL programming framework

General purpose computing on graphics processing units using OpenCL - Motion detection using NVIDIA Fermi and the OpenCL programming framework

Generella berakningar med hjalp av grafikprocessorer (General-Purpose computation using Graphics Processing Units, GPGPU) har varit ett aktivt forskningsomrade under manga ar. Stora framsteg har gjorts under 2009 och 2010 i och med lanseringen av programmeringsramverket Open Computing Language (OpenCL) och NVIDIAs nya GPU-arkitektur Fermi. Denna tes utforskar hardvaruarkitekturen hos tre grafikprocessorer och hur val de ar anpassade for generella berakningar; NVIDIA Geforce 8800 (G80-arkitekturen) utgivet 2006, AMD Radeon 4870 (RV700-arkitekturen) utgivet 2008 och NVIDIA Geforce GTX 480 (Fermi-arkitekturen) utgivet 2010. Stort fokus laggs pa Fermi och de GPGPU-relaterade forbattringar som gjorts pa denna arkitektur jamfort med tidigare generationer. Ramverket OpenCL och den relativa paverkan hos flertalet olika optimeringar av en parallell applikation har utvarderats genom att implementera Lukas-Kanades algoritm for uppskattning av optiskt flode. RV700-arkitekturen ar ej lampad for generella berakningar. Prestandan hos G80-arkitekturen ar utmarkt trots dess relativa alder. Mycket moda maste dock tillagnas G80-specifika optimeringar av den parallella applikationen for att kunna uppna hogsta mojliga prestanda. Fermi ar overlagsen i alla aspekter av GPGPU. Fermis nya minneshierarki tillater att generella berakningar utfors bade lattare och snabbare an tidigare. Pa samma gang ar Fermis prestanda mycket hogre an hos de tva andra arkitekturerna och detta redan innan nagra hardvaruspecifika optimeringar gjorts. Programmeringsramverket OpenCL ar ett stabilt och kompetent ramverk val anpassat for GPGPU-projekt som kan dra nytta av den okade flexibiliteten av mjuk- och hardvaruoberoende. Om prestanda ar viktigare an flexibilitet kan dock NVIDIAs Compute Unified Device Architecture (CUDA) eller AMDs ATI Stream vara battre alternativ. Abstract: General-Purpose computing using Graphics Processing Units (GPGPU) has been an area of active research for many years. During 2009 and 2010 much has happened in the GPGPU research field with the release of the Open Computing Language (OpenCL) programming framework and the new NVIDIA Fermi Graphics Processing Unit (GPU) architecture. This thesis explores the hardware architectures of three GPUs and how well they support general computations; the NVIDIA Geforce 8800 GTS (the G80 architecture) from 2006, the AMD Radeon 4870 (the RV700 architecture) from 2008 and the NVIDIA Geforce GTX 480 (the Fermi architecture) from 2010. Special concern is given to the new Fermi architecture and the GPGPU related improvements implemented in this architecture. The Lukas-Kanade algorithm for optical flow estimation has been implemented in OpenCL to evaluate the framework and the impact of several different parallel application optimizations. The RV700 architecture is not well suited for GPGPU. The performance of the G80 architecture is very good taking its relative age into account. However, much effort must be spent optimizing a parallel application for the G80 before full performance is obtained, a task that can be quite tedious. Fermi excels in all aspects of GPGPU programming. Fermi’s performance is much higher than that of the RV700 and the G80 architectures and its new memory hierarchy makes GPGPU programming easier than ever before. OpenCL is a stable and competent framework well suited for any GPGPU project that would benefit from the increased flexibility of software and hardware platform independence. However, if performance is more important than flexibility, NVIDIA’s Compute Unified Device Architecture (CUDA) or AMD’s ATI Stream might be better alternatives.

Mats Johansson | Oscar Winter

[1] Azriel Rosenfeld,et al. Accurate dense optical flow estimation using adaptive structure tensors and a parametric model , 2002, Object recognition supported by user interaction for service robots.

[2] J. Weickert,et al. Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods , 2005 .

[3] James N. England. A system for interactive modeling of physical curved surface objects , 1978, SIGGRAPH '78.

[4] Dinesh Manocha,et al. Fast computation of generalized Voronoi diagrams using graphics hardware , 1999, SIGGRAPH.

[5] Robert Sutherland. Thinking in Perspective , 1950 .

[6] David H. Warren,et al. Electronic spatial sensing for the blind : contributions from perception, rehabilitation, and computer vision , 1985 .

[7] Berthold K. P. Horn,et al. Determining Optical Flow , 1981, Other Conferences.

[8] Michael Potmesil,et al. The pixel machine: a parallel image computer , 1989, SIGGRAPH.

[9] John C. Hart,et al. The ray engine , 2002, HWWS '02.

[10] Julien Marzat,et al. Real-Time Dense and Accurate Parallel Optical Flow using CUDA , 2009 .

[11] John M. Bodily,et al. An Optical Flow Implementation Comparison Study , 2009 .

[12] Bruce Randall Donald,et al. Real-time robot motion planning using rasterizing computer graphics hardware , 1990, SIGGRAPH.

[13] Anselmo Lastra,et al. PixelFlow: the realization , 1997, HWWS '97.

[14] Ellen M. Markman,et al. Thinking in perspective: Critical essays in the study of thought processes. , 1979 .

[15] Mancia Anguita,et al. Optimization Strategies for High-Performance Computing of Optical-Flow in General-Purpose Processors , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[16] Anselmo Lastra,et al. A shading language on graphics hardware: the pixelflow shading system , 1998, SIGGRAPH.

[17] Gershon Kedem,et al. Brute Force Attack on UNIX Passwords with SIMD Computer , 1999, USENIX Security Symposium.

[18] Carlo Tomasi,et al. Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[19] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[20] G. Farneback. Fast and accurate motion estimation using orientation tensors and parametric motion models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[21] Frédéric Champagnat,et al. Dense optical flow by iterative local window registration , 2005, IEEE International Conference on Image Processing 2005.

[22] Pat Hanrahan,et al. A real-time procedural shading system for programmable graphics hardware , 2001, SIGGRAPH.

[23] Marc Olano,et al. Interactive multi-pass programmable shading , 2000, SIGGRAPH.

[24] A. James Stewart,et al. General Calculations using Graphics Hardware with Applications to Interactive Caustics , 2000, Rendering Techniques.

[25] Christian-A. Bohn. Kohonen Feature Mapping through Graphics Hardware , 1998 .