Performance Optimization of Multithreaded 2D Fast Fourier Transform on Multicore Processors Using Load Imbalancing Parallel Computing Method
暂无分享,去创建一个
[1] Wei Chu,et al. A Noise-Robust FFT-Based Auditory Spectrum With Application in Audio Classification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[2] José Nelson Amaral,et al. Forma: A framework for safe automatic array reshaping , 2007, ACM Trans. Program. Lang. Syst..
[3] Alexey L. Lastovetsky,et al. Dynamic Load Balancing of Parallel Computational Iterative Routines on Highly Heterogeneous HPC Platforms , 2011, Parallel Process. Lett..
[4] Guang R. Gao,et al. Optimizing the Fast Fourier Transform on a Multi-core Architecture , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[5] João P.F. Barbosa,et al. A high performance hardware accelerator for dynamic texture segmentation , 2015, J. Syst. Archit..
[6] Satoshi Matsuoka,et al. An efficient, model-based CPU-GPU heterogeneous FFT library , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[7] Yifeng Chen,et al. Large-scale FFT on GPU clusters , 2010, ICS '10.
[8] Lian-Ping Wang,et al. Scalable parallel FFT for spectral simulations on a Beowulf cluster , 2001, Parallel Comput..
[9] Teresa H. Y. Meng,et al. Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.
[10] Robert A. van de Geijn,et al. Solving dense linear systems on platforms with multiple hardware accelerators , 2009, PPoPP '09.
[11] George Cybenko,et al. Dynamic Load Balancing for Distributed Memory Multiprocessors , 1989, J. Parallel Distributed Comput..
[12] Alexey L. Lastovetsky,et al. Design and implementation of self-adaptable parallel algorithms for scientific computing on highly heterogeneous HPC platforms , 2011, ArXiv.
[13] Alexey L. Lastovetsky,et al. Model-Based Optimization of EULAG Kernel on Intel Xeon Phi Through Load Imbalancing , 2017, IEEE Transactions on Parallel and Distributed Systems.
[14] Alexey Lastovetsky,et al. Bi-Objective Optimization of Data-Parallel Applications on Homogeneous Multicore Clusters for Performance and Energy , 2018, IEEE Transactions on Computers.
[15] Alexey L. Lastovetsky,et al. Model-based optimization of MPDATA on Intel Xeon Phi through load imbalancing , 2015, ArXiv.
[16] Francisco Almeida,et al. Parallel FFT-2D in Heterogeneous Systems , 2005, Parallel and Distributed Computing and Networks.
[17] Antonio J. Plaza,et al. Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE , 2011, The Journal of Supercomputing.
[18] Jacques M. Bahi,et al. Synchronous distributed load balancing on dynamic networks , 2005, J. Parallel Distributed Comput..
[19] Dragan Matic,et al. Fault Diagnosis of Rotating Electrical Machines in Transient Regime Using a Single Stator Current’s FFT , 2015, IEEE Transactions on Instrumentation and Measurement.
[20] Toshiyuki Imamura,et al. Parallel implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics simulations , 2016, Comput. Phys. Commun..
[21] Alexey L. Lastovetsky,et al. Data partitioning with a realistic performance model of networks of heterogeneous computers , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[22] Hironori Kasahara,et al. Cache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding , 2003, LCPC.
[23] Steven G. Johnson,et al. The Fastest Fourier Transform in the West , 1997 .
[24] Ning Li,et al. 2DECOMP&FFT - A Highly Scalable 2D Decomposition Library and FFT Interface , 2010 .
[25] Liang Gu,et al. Using GPUs to compute large out-of-card FFTs , 2011, ICS '11.
[26] Jeffrey K. Hollingsworth,et al. Computation-communication overlap and parameter auto-tuning for scalable parallel 3-D FFT , 2016, J. Comput. Sci..
[27] Lian-Ping Wang,et al. Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition , 2013, Parallel Comput..
[28] Vilas H. Naik,et al. Analysis of performance enhancement on graphic processor based heterogeneous architecture: A CUDA and MATLAB experiment , 2015, 2015 National Conference on Parallel Computing Technologies (PARCOMPTECH).
[29] Fang Liu,et al. An asynchronous load balancing scheme for multi-server systems , 2016, 2016 IEEE 7th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON).
[30] Alexey L. Lastovetsky,et al. Data Partitioning with a Functional Performance Model of Heterogeneous Processors , 2007, Int. J. High Perform. Comput. Appl..
[31] Cédric Augonnet,et al. Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures , 2009, Euro-Par Workshops.
[32] Truong Vinh Truong Duy,et al. A decomposition method with minimum communication amount for parallelization of multi-dimensional FFTs , 2014, Comput. Phys. Commun..
[33] Alexey Lastovetsky,et al. A Novel Data-Partitioning Algorithm for Performance Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms , 2018, IEEE Transactions on Parallel and Distributed Systems.
[34] Myeongsu Kang,et al. Time-Varying and Multiresolution Envelope Analysis and Discriminative Feature Analysis for Bearing Fault Diagnosis , 2015, IEEE Transactions on Industrial Electronics.
[35] Sriram Krishnamoorthy,et al. Effective padding of multidimensional arrays to avoid cache conflict misses , 2016, PLDI.
[36] Dmitry Pekurovsky,et al. P3DFFT: A Framework for Parallel Computations of Fourier Transforms in Three Dimensions , 2012, SIAM J. Sci. Comput..
[37] Alexey L. Lastovetsky,et al. New Model-Based Methods and Algorithms for Performance and Energy Optimization of Data Parallel Applications on Homogeneous Multicore Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.
[38] Laurent Alaus,et al. A common operator for FFT and FEC decoding , 2011, Microprocess. Microsystems.
[39] Joseph JáJá,et al. Optimized FFT computations on heterogeneous platforms with application to the Poisson equation , 2014, J. Parallel Distributed Comput..
[40] Amir Averbuch,et al. Portable parallel FFT for MIMD multiprocessors , 1998, Concurr. Pract. Exp..
[41] Ioana Banicescu,et al. Dynamic load balancing with adaptive factoring methods in scientific applications , 2007, The Journal of Supercomputing.
[42] Jacques M. Bahi,et al. Dynamic load balancing and efficient load estimators for asynchronous iterative algorithms , 2005, IEEE Transactions on Parallel and Distributed Systems.