Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
暂无分享,去创建一个
[1] Jiri Matas,et al. Online learning of robust object detectors during unstable tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.
[2] Mike O'Connor,et al. Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[3] Jan Vitek,et al. A family of real‐time Java benchmarks , 2011, Concurr. Comput. Pract. Exp..
[4] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[5] Kushagra Vaid,et al. Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.
[6] Kevin Skadron,et al. Accelerating SQL database operations on a GPU with CUDA , 2010, GPGPU-3.
[7] Dong Li,et al. The tradeoffs of fused memory hierarchies in heterogeneous computing architectures , 2012, CF '12.
[8] Karama Kanoun,et al. The Autonomic Computing Benchmark , 2008 .
[9] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[10] Kim M. Hazelwood,et al. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[11] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[12] William Thies,et al. Teleport messaging for distributed stream programs , 2005, PPoPP.
[13] Kai Li,et al. Fidelity and scaling of the PARSEC benchmark inputs , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[14] Tony Lau,et al. THE AUTONOMIC COMPUTING BENCHMARK , 2008 .
[15] David R. Kaeli,et al. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.
[16] Wen-mei W. Hwu,et al. Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors , 2012, PPoPP '12.
[17] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[18] Benedict R. Gaster,et al. Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck? , 2012, Computer.
[19] P. Hanrahan,et al. Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[20] Lieven Eeckhout,et al. Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[21] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[22] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[23] Jeffrey S. Vetter,et al. Maestro: Data Orchestration and Tuning for OpenCL Devices , 2010, Euro-Par.
[24] David Kaeli,et al. Heterogeneous Computing with OpenCL , 2011 .
[25] Milind Kulkarni,et al. Towards architecture independent metrics for multicore performance analysis , 2011, PERV.
[26] Daisuke Takahashi,et al. The HPC Challenge (HPCC) benchmark suite , 2006, SC.
[27] Kai Nagel,et al. Multi-agent traffic simulation with CUDA , 2009, 2009 International Conference on High Performance Computing & Simulation.
[28] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[29] Scott B. Baden,et al. Redefining the Role of the CPU in the Era of CPU-GPU Integration , 2012, IEEE Micro.
[30] David R. Kaeli,et al. Analyzing program flow within a many-kernel OpenCL application , 2011, GPGPU-4.
[31] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[32] William J. Dally,et al. Energy-efficient mechanisms for managing thread context in throughput processors , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[33] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[34] Timothy G. Mattson,et al. OpenCL Programming Guide , 2011 .