Architecture comparisons between Nvidia and ATI GPUs: Computation parallelism and data communications

In recent years, modern graphics processing units have been widely adopted in high performance computing areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and ATI have introduced series of products to the market. While sharing many similar design concepts, GPUs from these two manufacturers differ in several aspects on processor cores and the memory subsystem. In this paper, we conduct a comprehensive study to characterize the architectural differences between Nvidia's Fermi and ATI's Cypress and demonstrate their impact on performance. Our results indicate that these two products have diverse advantages that are reflected in their performance for different sets of applications. In addition, we also compare the energy efficiencies of these two platforms since power/energy consumption is a major concern in the high performance computing.

[1]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[2]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[3]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[4]  G. Celeux,et al.  Comparison of the mixture and the classification maximum likelihood in cluster analysis , 1993 .

[5]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[7]  Lieven Eeckhout,et al.  Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[8]  Lu Peng,et al.  Memory Performance and Scalability of Intel's and AMD's Dual-Core Processors: A Case Study , 2007, 2007 IEEE International Performance, Computing, and Communications Conference.

[9]  Majid Sarrafzadeh,et al.  Energy-aware high performance computing with graphic processing units , 2008, CLUSTER 2008.

[10]  Carl Staelin,et al.  Memory hierarchy performance measurement of commercial dual-core desktop processors , 2008, J. Syst. Archit..

[11]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[12]  Song Huang,et al.  On the energy efficiency of graphics processing units for scientific computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[13]  Wolfgang E. Nagel,et al.  Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[15]  Andreas Moshovos,et al.  Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[16]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[17]  Tao Li,et al.  Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[18]  Reiji Suda,et al.  Investigation on the power efficiency of multi-core and GPU Processing Element in large scale SIMD computation with CUDA , 2010, International Conference on Green Computing.

[19]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[20]  Xiaoming Li,et al.  A Micro-benchmark Suite for AMD GPUs , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[21]  Bin Li,et al.  Performance and Power Analysis of ATI GPU: A Statistical Approach , 2011, 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage.

[22]  Bin Li,et al.  Tree structured analysis on GPU power study , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[23]  Kim M. Hazelwood,et al.  Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[24]  Mohamed F. Ahmed,et al.  A comparative benchmarking of the FFT on Fermi and Evergreen GPUs , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[25]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[26]  Jack J. Dongarra,et al.  From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..