Benchmarking Data and Compute Intensive Applications on Modern CPU and GPU Architectures
暂无分享,去创建一个
[1] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[2] Ware Myers. Supercomputing 91 , 1992 .
[3] Kurt Keutzer,et al. Considerations When Evaluating Microprocessor Platforms , 2011, HotPar.
[4] Jiří Matela. GPU-Based DWT Acceleration for JPEG2000 , 2009 .
[5] Stewart Taylor,et al. Optimizing Applications for Multi-Core Processors, Using the Intel® Integrated Performance Primitives, Second Edition , 2007 .
[6] Pawel Gepner,et al. Early Performance Evaluation of New Six-Core Intel® Xeon® 5600 Family Processors for HPC , 2010, 2010 Ninth International Symposium on Parallel and Distributed Computing.
[7] Nagiza F. Samatova,et al. Lessons Learned from Exploring the Backtracking Paradigm on the GPU , 2011, Euro-Par.
[8] Jos B. T. M. Roerdink,et al. Accelerating Wavelet Lifting on Graphics Hardware Using CUDA , 2011, IEEE Transactions on Parallel and Distributed Systems.
[9] Sanketh Datla,et al. Parallelizing Motion JPEG 2000 with CUDA , 2009, 2009 Second International Conference on Computer and Electrical Engineering.
[10] Michal Kierzynka,et al. Efficient Isosurface Extraction Using Marching Tetrahedra and Histogram Pyramids on Multiple GPUs , 2011, PPAM.
[11] Enrico Magli,et al. Transform Coding Techniques for Lossy Hyperspectral Data Compression , 2007, IEEE Transactions on Geoscience and Remote Sensing.
[12] Murat Efe Guney,et al. On the limits of GPU acceleration , 2010 .
[13] Jacek Blazewicz,et al. G-MSA - A GPU-based, fast and accurate algorithm for multiple sequence alignment , 2013, J. Parallel Distributed Comput..
[14] Pradeep Dubey,et al. FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.
[15] Susan S. Young,et al. JPEG 2000 compression of medical imagery , 2000, Medical Imaging.
[16] Mircea Andrecut,et al. Parallel GPU Implementation of Iterative PCA Algorithms , 2008, J. Comput. Biol..
[17] Michal Kierzynka,et al. CaKernel --A parallel application programming framework for heterogenous computing architectures , 2011 .
[18] Petr Holub,et al. GPU-Based Sample-Parallel Context Modeling for EBCOT in JPEG2000 , 2010, MEMICS.
[19] Alex Fit-Florea,et al. Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs , 2011 .
[20] Manuel E. Acacio,et al. A Parallel Implementation of the 2D Wavelet Transform Using CUDA , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.
[21] Pawel Gepner,et al. Parallel application benchmarks and performance evaluation of the Intel Xeon 7500 family processors , 2011, ICCS.
[22] Jacek Blazewicz,et al. Protein alignment algorithms with an efficient backtracking routine on multiple GPUs , 2011, BMC Bioinformatics.
[23] David A. Bader,et al. Computing discrete transforms on the Cell Broadband Engine , 2009, Parallel Comput..
[24] David S. Taubman,et al. High performance scalable image compression with EBCOT , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).
[25] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[26] Antonio Plaza,et al. GPU implementation of JPEG2000 for hyperspectral image compression , 2011, Remote Sensing.
[27] Pradeep Dubey,et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.
[28] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[29] Michael Lang,et al. A Performance Evaluation of the Nehalem Quad-Core Processor for Scientific Computing , 2008, Parallel Process. Lett..
[30] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[31] Chih-Hsien Hsia,et al. High Efficiency EBCOT with Parallel Coding Architecture for JPEG2000 , 2006, EURASIP J. Adv. Signal Process..
[32] Pawel Gepner,et al. Evaluation of Executing DGEMM Algorithms on Modern Multicore CPU , 2011 .
[33] Satoshi Matsuoka,et al. Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.