Parallelization of Hierarchical Matrix Algorithms for Electromagnetic Scattering Problems
暂无分享,去创建一个
Christoph W. Kessler | Corinne Ancourt | Elisabeth Larsson | Clemens Grelck | Francesca Vipiana | Giuseppe Vecchi | Matteo Alessandro Francavilla | Marco Righero | Giorgio Giordanengo | Afshin Zafari | C. Kessler | E. Larsson | M. Righero | G. Vecchi | C. Grelck | F. Vipiana | M. Francavilla | A. Zafari | Corinne Ancourt | G. Giordanengo
[1] Corinne Ancourt,et al. An up to date Mapping Methodology for GPUs , 2018 .
[2] Bernd Scheuermann,et al. A Data-Flow Based Coordination Approach to Concurrent Software Engineering , 2012, 2012 Data-Flow Execution Models for Extreme Scale Computing.
[3] Lexing Ying,et al. A Parallel Directional Fast Multipole Method , 2013, SIAM J. Sci. Comput..
[4] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[5] Satoshi Matsuoka,et al. Tapas: An Implicitly Parallel Programming Framework for Hierarchical N-Body Algorithms , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).
[6] Christoph W. Kessler,et al. SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems , 2018, International Journal of Parallel Programming.
[7] Hatem Ltaief,et al. Data‐driven execution of fast multipole methods , 2012, Concurr. Comput. Pract. Exp..
[8] Jack Dongarra,et al. LAPACK Users' Guide, 3rd ed. , 1999 .
[9] Matthew G. Knepley,et al. PetFMM—A dynamically load‐balancing parallel fast multipole library , 2009, ArXiv.
[10] Martin Nilsson,et al. Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 916 Fast Numerical Techniques for Electromagnetic Problems in Frequency Domain , 2003 .
[11] Emmanuel Agullo,et al. Task-Based FMM for Multicore Architectures , 2014, SIAM J. Sci. Comput..
[12] Emmanuel Agullo,et al. Bridging the Gap Between OpenMP and Task-Based Runtime Systems for the Fast Multipole Method , 2017, IEEE Transactions on Parallel and Distributed Systems.
[13] Thomas Hérault,et al. PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.
[14] Kathleen Knobe,et al. Ease of use with concurrent collections (CnC) , 2009 .
[15] Samuel Thibault,et al. On Runtime Systems for Task-based Programming on Heterogeneous Platforms , 2018 .
[16] Elisabeth Larsson,et al. DuctTeip: An efficient programming model for distributed task based parallel computing , 2018, Parallel Comput..
[17] Jiming Song,et al. Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects , 1997 .
[18] Cyril Bordage,et al. Parallelization on Heterogeneous Multicore and Multi-GPU Systems of the Fast Multipole Method for the Helmholtz Equation Using a Runtime System , 2012 .
[19] Jin-Fa Lee,et al. A fast IE-FFT algorithm for solving PEC scattering problems , 2005 .
[20] Ludek Matyska,et al. Optimizing CUDA code by kernel fusion: application on BLAS , 2013, The Journal of Supercomputing.
[21] Richard W. Vuduc,et al. A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[22] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[23] Jakub Kurzak,et al. Massively parallel implementation of a fast multipole method for distributed memory machines , 2005, J. Parallel Distributed Comput..
[24] Ozgur Ergul,et al. Hierarchical parallelization of the multilevel fast multipole algorithm (MLFMA) , 2013 .
[25] Francesca Vipiana,et al. Nested Equivalent Source Approximation for the Modeling of Multiscale Structures , 2014, IEEE Transactions on Antennas and Propagation.
[26] Christoph W. Kessler,et al. Lazy Allocation and Transfer Fusion Optimization for GPU-Based Heterogeneous Systems , 2018, 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).
[27] Jiri Filipovic,et al. OpenCL Kernel Fusion for GPU, Xeon Phi and CPU , 2015, 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[28] Mohamed Wahib,et al. Scalable Kernel Fusion for Memory-Bound GPU Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[29] Sverker Holmgren,et al. Dynamic Autotuning of Adaptive Fast Multipole Methods on Hybrid Multicore CPU and GPU Systems , 2013, SIAM J. Sci. Comput..
[30] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[31] Vivek Sarkar,et al. Declarative aspects of memory management in the concurrent collections parallel programming model , 2009, DAMP '09.
[32] Clemens Grelck,et al. An Efficient Scalable Runtime System for Macro Data Flow Processing Using S-Net , 2014, International Journal of Parallel Programming.
[33] Andrew Richards,et al. Programmability and performance portability aspects of heterogeneous multi-/manycore systems , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[34] Simon McIntosh-Smith,et al. On the Performance of Parallel Tasking Runtimes for an Irregular Fast Multipole Method Application , 2017, IWOMP.
[35] D. Wilton,et al. Electromagnetic scattering by surfaces of arbitrary shape , 1980 .
[36] Elisabeth Larsson,et al. Task parallel implementation of a solver for electromagnetic scattering problems , 2018, ArXiv.
[37] Elisabeth Larsson,et al. Resource-Aware Task Scheduling , 2015, ACM Trans. Embed. Comput. Syst..
[38] Vivek Sarkar,et al. Multi-core Implementations of the Concurrent Collections Programming Model , 2008 .
[39] Christoph W. Kessler,et al. SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.
[40] Giuseppe Vecchi,et al. Wideband Fast Kernel-Independent Modeling of Large Multiscale Structures Via Nested Equivalent Source Approximation , 2015, IEEE Transactions on Antennas and Propagation.
[41] Wei Yi,et al. Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.
[42] M. Vouvakis,et al. The adaptive cross approximation algorithm for accelerated method of moments computations of EMC problems , 2005, IEEE Transactions on Electromagnetic Compatibility.
[43] Afshin Zafari,et al. TaskUniVerse: A Task-Based Unified Interface for Versatile Parallel Execution , 2017, PPAM.
[44] Clemens Grelck,et al. Distributed S-Net: Cluster and Grid Computing without the Hassle , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).
[45] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .
[46] Thomas Fahringer,et al. Adaptive Granularity Control in Task Parallel Programs Using Multiversioning , 2013, Euro-Par.
[47] Francesca Vipiana,et al. EFIE Modeling of High-Definition Multiscale Structures , 2010, IEEE Transactions on Antennas and Propagation.
[48] Alexander V. Shafarenko,et al. Coordinating Data Parallel SAC Programs with S-Net , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[49] Richard W. Vuduc,et al. Performance evaluation of concurrent collections on high-performance multicore computing systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[50] Alexander V. Shafarenko,et al. The Cost and Benefits of Coordination Programming: Two Case Studies in Concurrent Collections and S-NET , 2016, Parallel Process. Lett..
[51] Nicholas Carriero,et al. Coordination languages and their significance , 1992, CACM.
[52] Michael F. P. O'Boyle,et al. MaxPair: Enhance OpenCL Concurrent Kernel Execution by Weighted Maximum Matching , 2018, GPGPU@PPoPP.
[53] Bo Zhang,et al. Asynchronous Task Scheduling of the Fast Multipole Method Using Various Runtime Systems , 2014, 2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing.
[54] Jürgen Teich,et al. Automatic Kernel Fusion for Image Processing DSLs , 2018, SCOPES.
[55] Jeff A. Stuart,et al. A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).
[56] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[57] Alexander V. Shafarenko,et al. Asynchronous Stream Processing with S-Net , 2010, International Journal of Parallel Programming.
[58] Martin Tillenius,et al. SuperGlue: A Shared Memory Framework Using Data Versioning for Dependency-Aware Task-Based Parallelization , 2015, SIAM J. Sci. Comput..
[59] Francesca Vipiana,et al. A Doubly Hierarchical MoM for High-Fidelity Modeling of Multiscale Structures , 2014, IEEE Transactions on Electromagnetic Compatibility.
[60] Eric Darve,et al. The fast multipole method on parallel clusters, multicore processors, and graphics processing units , 2011 .
[61] S. Velamparambil,et al. Analysis and performance of a distributed memory multilevel fast multipole algorithm , 2005, IEEE Transactions on Antennas and Propagation.
[62] Alexander V. Shafarenko,et al. Parallel signal processing with S-Net , 2010, ICCS.
[63] Petru Eles,et al. Latency-aware packet processing on CPU-GPU heterogeneous systems , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).