暂无分享,去创建一个
[1] Hironori Kasahara,et al. Cache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding , 2003, LCPC.
[2] Alexey Lastovetsky,et al. A Novel Data-Partitioning Algorithm for Performance Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms , 2018, IEEE Transactions on Parallel and Distributed Systems.
[3] Vilas H. Naik,et al. Analysis of performance enhancement on graphic processor based heterogeneous architecture: A CUDA and MATLAB experiment , 2015, 2015 National Conference on Parallel Computing Technologies (PARCOMPTECH).
[4] Peng Jiang,et al. Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation , 2017, ICS.
[5] Lian-Ping Wang,et al. Scalable parallel FFT for spectral simulations on a Beowulf cluster , 2001, Parallel Comput..
[6] Fang Liu,et al. An asynchronous load balancing scheme for multi-server systems , 2016, 2016 IEEE 7th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON).
[7] Alexey L. Lastovetsky,et al. Data Partitioning with a Functional Performance Model of Heterogeneous Processors , 2007, Int. J. High Perform. Comput. Appl..
[8] Alexey L. Lastovetsky,et al. Dynamic Load Balancing of Parallel Computational Iterative Routines on Highly Heterogeneous HPC Platforms , 2011, Parallel Process. Lett..
[9] José Nelson Amaral,et al. Forma: A framework for safe automatic array reshaping , 2007, ACM Trans. Program. Lang. Syst..
[10] Teresa H. Y. Meng,et al. Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.
[11] Robert A. van de Geijn,et al. Solving dense linear systems on platforms with multiple hardware accelerators , 2009, PPoPP '09.
[12] Satoshi Matsuoka,et al. An efficient, model-based CPU-GPU heterogeneous FFT library , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[13] Truong Vinh Truong Duy,et al. A decomposition method with minimum communication amount for parallelization of multi-dimensional FFTs , 2014, Comput. Phys. Commun..
[14] Liang Gu,et al. Using GPUs to compute large out-of-card FFTs , 2011, ICS '11.
[15] Cédric Augonnet,et al. Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures , 2009, Euro-Par Workshops.
[16] Sriram Krishnamoorthy,et al. Effective padding of multidimensional arrays to avoid cache conflict misses , 2016, PLDI.
[17] Dmitry Pekurovsky,et al. P3DFFT: A Framework for Parallel Computations of Fourier Transforms in Three Dimensions , 2012, SIAM J. Sci. Comput..
[18] Steven G. Johnson,et al. The Fastest Fourier Transform in the West , 1997 .
[19] Alexey L. Lastovetsky,et al. Data partitioning with a realistic performance model of networks of heterogeneous computers , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[20] George Cybenko,et al. Dynamic Load Balancing for Distributed Memory Multiprocessors , 1989, J. Parallel Distributed Comput..
[21] Alexey L. Lastovetsky,et al. Design and implementation of self-adaptable parallel algorithms for scientific computing on highly heterogeneous HPC platforms , 2011, ArXiv.
[22] Alexey L. Lastovetsky,et al. Model-Based Optimization of EULAG Kernel on Intel Xeon Phi Through Load Imbalancing , 2017, IEEE Transactions on Parallel and Distributed Systems.
[23] Ning Li,et al. 2DECOMP&FFT - A Highly Scalable 2D Decomposition Library and FFT Interface , 2010 .
[24] Antonio J. Plaza,et al. Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE , 2011, The Journal of Supercomputing.
[25] Jacques M. Bahi,et al. Synchronous distributed load balancing on dynamic networks , 2005, J. Parallel Distributed Comput..
[26] Alexey Lastovetsky,et al. Bi-Objective Optimization of Data-Parallel Applications on Homogeneous Multicore Clusters for Performance and Energy , 2018, IEEE Transactions on Computers.
[27] Alexey L. Lastovetsky,et al. Model-based optimization of MPDATA on Intel Xeon Phi through load imbalancing , 2015, ArXiv.
[28] Ioana Banicescu,et al. Dynamic load balancing with adaptive factoring methods in scientific applications , 2007, The Journal of Supercomputing.
[29] Jacques M. Bahi,et al. Dynamic load balancing and efficient load estimators for asynchronous iterative algorithms , 2005, IEEE Transactions on Parallel and Distributed Systems.
[30] Alexey L. Lastovetsky,et al. New Model-Based Methods and Algorithms for Performance and Energy Optimization of Data Parallel Applications on Homogeneous Multicore Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.
[31] Joseph JáJá,et al. Optimized FFT computations on heterogeneous platforms with application to the Poisson equation , 2014, J. Parallel Distributed Comput..
[32] Amir Averbuch,et al. Portable parallel FFT for MIMD multiprocessors , 1998, Concurr. Pract. Exp..
[33] Yifeng Chen,et al. Large-scale FFT on GPU clusters , 2010, ICS '10.