暂无分享,去创建一个
Dhairya Malhotra | George Biros | Amir Gholami | Judith Hill | G. Biros | A. Gholami | Judith C. Hill | Dhairya Malhotra | D. Malhotra
[1] George Karypis,et al. Introduction to Parallel Computing , 1994 .
[2] Salvatore Filippone. The IBM Parallel Engineering and Scientific Subroutine Library , 1995, PARA.
[3] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..
[4] Jacob K. White,et al. A precorrected-FFT method for electrostatic analysis of complicated 3-D structures , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[5] J. Michel,et al. Effective properties of composite materials with periodic microstructure : a computational approach , 1999 .
[6] O. Bruno,et al. A fast, high-order algorithm for the solution of surface scattering problems: basic implementation, tests, and applications , 2001 .
[7] 장윤희,et al. Y. , 2003, Industrial and Labor Relations Terms.
[8] Charles S. Peskin,et al. Shared-Memory Parallel Vector Implementation of the Immersed Boundary Method for the Computation of Blood Flow in the Beating Mammalian Heart , 2004, The Journal of Supercomputing.
[9] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[10] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[11] Víctor M. Pérez-García,et al. Spectral Methods for Partial Differential Equations in Irregular Domains: The Spectral Smoothed Boundary Method , 2006, SIAM J. Sci. Comput..
[12] Y. Mukaigawa,et al. Large Deviations Estimates for Some Non-local Equations I. Fast Decaying Kernels and Explicit Bounds , 2022 .
[13] Stephen R. Comeau,et al. PIPER: An FFT‐based protein docking program with pairwise potentials , 2006, Proteins.
[14] P. Hut,et al. Gravitational N-body Simulations , 2008, 0806.3950.
[15] Franz Franchetti,et al. Discrete fourier transform on multicore , 2009, IEEE Signal Processing Magazine.
[16] Daisuke Takahashi. An Implementation of Parallel 3-D FFT with 2-D Decomposition on a Massively Parallel Cluster of Multi-core Processors , 2009, PPAM.
[17] William Gropp,et al. An introductory exascale feasibility study for FFTs and multigrid , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[18] Ning Li,et al. 2DECOMP&FFT - A Highly Scalable 2D Decomposition Library and FFT Interface , 2010 .
[19] Edmond Chow,et al. Exploiting 162-Nanosecond End-to-End Communication Latency on Anton , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Liang Gu,et al. Using GPUs to compute large out-of-card FFTs , 2011, ICS '11.
[21] Ping Tak Peter Tang,et al. A framework for low-communication 1-D FFT , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Franz Franchetti,et al. Automatic Generation of the HPC Challenge's Global FFT Benchmark for BlueGene/P , 2012, VECPAR.
[23] Jing Wu,et al. Optimized strategies for mapping three-dimensional FFTs onto CUDA GPUs , 2012, 2012 Innovative Parallel Computing (InPar).
[24] Richard W. Vuduc,et al. On the communication complexity of 3D FFTs and its implications for Exascale , 2012, ICS '12.
[25] Satoshi Matsuoka,et al. Scalable multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[26] Dmitry Pekurovsky,et al. P3DFFT: A Framework for Parallel Computations of Fourier Transforms in Three Dimensions , 2012, SIAM J. Sci. Comput..
[27] Alistair P. Rendell,et al. Implementation of 3D FFTs Across Multiple GPUs in Shared Memory Environments , 2012, 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies.
[28] Guang R. Gao,et al. Demystifying Performance Predictions of Distributed FFT3D Implementations , 2012, NPC.
[29] Lian-Ping Wang,et al. Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition , 2013, Parallel Comput..
[30] D. Takahashi. Implementation of Parallel 1-D FFT on GPU Clusters , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.
[31] Myoungkyu Lee,et al. Petascale direct numerical simulation of turbulent channel flow on up to 786K cores , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[32] Daniel Potts,et al. Parallel Three-Dimensional Nonequispaced Fast Fourier Transforms and Their Application to Particle Simulation , 2013, SIAM J. Sci. Comput..
[33] Pradeep Dubey,et al. Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[34] Michael Pippig. PFFT: An Extension of FFTW to Massively Parallel Architectures , 2013, SIAM J. Sci. Comput..
[35] Hari Sundar,et al. HykSort: a new variant of hypercube quicksort on distributed memory architectures , 2013, ICS '13.
[36] Jesper Larsson Träff,et al. Implementing a classic: zero-copy all-to-all communication with mpi datatypes , 2014, ICS '14.
[37] Jeffrey K. Hollingsworth,et al. Scaling Parallel 3-D FFT with Non-Blocking MPI Collectives , 2014, 2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems.
[38] Truong Vinh Truong Duy,et al. A decomposition method with minimum communication amount for parallelization of multi-dimensional FFTs , 2014, Comput. Phys. Commun..
[39] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[40] Martin D. Schatz,et al. Parallel Matrix Multiplication: A Systematic Journey , 2016, SIAM J. Sci. Comput..