ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors
暂无分享,去创建一个
[1] David A. Padua,et al. An Evaluation of Vectorizing Compilers , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[2] Toshio Nakatani,et al. AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[3] Austin R. Benson,et al. A framework for practical parallel fast matrix multiplication , 2014, PPoPP.
[4] M. Pharr,et al. ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).
[5] Kenneth E. Batcher,et al. Designing Sorting Networks , 2011 .
[6] Franz Franchetti,et al. Operator Language: A Program Generation Framework for Fast Kernels , 2009, DSL.
[7] Rezaur Rahman,et al. Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers , 2013 .
[8] Thomas N. Hibbard. An empirical study of minimal storage sorting , 1963, CACM.
[9] Pradeep Dubey,et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.
[10] Franz Franchetti,et al. Generating SIMD Vectorized Permutations , 2008, CC.
[11] Markus Püschel,et al. Computer generation of streaming sorting networks , 2012, DAC Design Automation Conference 2012.
[12] Rezaur Rahman. Intel® Xeon Phi™ Coprocessor Architecture and Tools , 2013, Apress.
[13] Andrew A. Davidson,et al. Efficient parallel merge sort for fixed and variable length keys , 2012, 2012 Innovative Parallel Computing (InPar).
[14] KumarSanjeev,et al. Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, VLDB 2008.
[15] Michael Garland,et al. Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[16] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[17] Gagan Agrawal,et al. A programming system for xeon phis with runtime SIMD parallelization , 2014, ICS '14.
[18] Hari Sundar,et al. HykSort: a new variant of hypercube quicksort on distributed memory architectures , 2013, ICS '13.
[19] Kenneth E. Batcher,et al. Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.
[20] James Reinders,et al. Intel® threading building blocks , 2008 .
[21] Franz Franchetti,et al. Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets , 2011, ICS '11.
[22] Satoshi Matsuoka,et al. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[23] Alexander A. Stepanov,et al. C++ Standard Template Library , 2000 .
[24] W. Weissblum. A Sorting Problem , 1960 .
[25] Kenneth E. Batcher,et al. Designing Sorting Networks: A New Paradigm , 2011 .
[26] Boris Schling. The Boost C++ Libraries , 2011 .
[27] Pradeep Dubey,et al. Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..