论文信息 - Performance analysis and fitness of GPGPU and multicore architectures for scientific applications - 字舞流文

Performance analysis and fitness of GPGPU and multicore architectures for scientific applications

Mohammad A. Bhuiyan | M. Bhuiyan

[1] Sabine Pruggnaller,et al. Performance evaluation of image processing algorithms on the GPU. , 2008, Journal of structural biology.

[2] Dharmendra S. Modha,et al. The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[3] Tien-Tsin Wong,et al. Evolutionary Computing on Consumer Graphics Hardware , 2007, IEEE Intelligent Systems.

[4] L. Fortuna,et al. Neuronal dynamics on FPGA: Izhikevich's model , 2005, SPIE Microtechnologies.

[5] Eugene M. Izhikevich,et al. Which model to use for cortical spiking neurons? , 2004, IEEE Transactions on Neural Networks.

[6] Uzi Vishkin,et al. A pilot study to compare programming effort for two parallel programming models , 2007, J. Syst. Softw..

[7] David Pellerin,et al. Practical FPGA programming in C , 2005 .

[8] Nikil D. Dutt,et al. A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors , 2009, Neural Networks.

[9] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[10] Megan Vance. A Migration-Based Parallel Programming Model with Architectural Support Structures , 2009, 2009 DoD High Performance Computing Modernization Program Users Group Conference.

[11] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[12] Jack J. Dongarra,et al. From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..

[13] Eugene M. Izhikevich,et al. Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[14] H. Markram. The Blue Brain Project , 2006, Nature Reviews Neuroscience.

[15] Paul-Jean Cagnard,et al. The parallel cellular programming model , 2000, Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing.

[16] Xiaofeng Gao,et al. Performance Sensitivity Studies for Strategic Applications , 2005, 2005 Users Group Conference (DOD-UGC'05).

[17] Gregory D. Peterson,et al. Analytical modeling of high performance reconfigurable computers: prediction and analysis of system performance , 2003 .

[18] Tarek M. Taha,et al. Character recognition with two spiking neural network models on multicore architectures , 2009, 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing.

[19] Ivan S Ufimtsev,et al. Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation. , 2009, Journal of chemical theory and computation.

[20] Gaurav Khanna,et al. Numerical modeling of gravitational wave sources accelerated by OpenCL , 2010, Comput. Phys. Commun..

[21] Patrick Horain,et al. GpuCV: A GPU-Accelerated Framework for Image Processing and Computer Vision , 2008, ISVC.

[22] Andres Upegui,et al. A Hardware Implementation of a Network of Functional Spiking Neurons with Hebbian Learning , 2004, BioADIT.

[23] Eugene M. Izhikevich,et al. Polychronization: Computation with Spikes , 2006, Neural Computation.

[24] Sayantan Sur,et al. Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms , 2007 .

[25] W. Mendenhall,et al. A Second Course in Statistics: Regression Analysis , 1996 .

[26] Luis A. Plana,et al. SpiNNaker: Mapping neural networks onto a massively-parallel chip multiprocessor , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[27] Yihan Shao,et al. Accelerating resolution-of-the-identity second-order Møller-Plesset quantum chemistry calculations with graphical processing units. , 2008, The journal of physical chemistry. A.

[28] Laura Carrington,et al. Modeling application performance by convolving machine signatures with application profiles , 2001 .

[29] John A. Keane,et al. Comparing distributed memory and virtual shared memory parallel programming models , 1995, Future Gener. Comput. Syst..

[30] Ivan Viola,et al. Two-Level Approach to Efficient Visualization of Protein Dynamics , 2007, IEEE Transactions on Visualization and Computer Graphics.

[31] C. Morris,et al. Voltage oscillations in the barnacle giant muscle fiber. , 1981, Biophysical journal.

[32] Samuel Williams,et al. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .

[33] Jesús Labarta,et al. A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[34] Pallipuram Krishnamani,et al. Acceleration of spiking neural networks on single-GPU and multi-GPU systems , 2010 .

[35] Junfeng Wu,et al. Parallel Phase Model: A Programming Model for High-end Parallel Machines with Manycores , 2009, 2009 International Conference on Parallel Processing.

[36] M. L. Sawley,et al. A comparison of parallel programming models for multiblock flow computations , 1995 .

[37] Alan D. George,et al. RAT: RC Amenability Test for Rapid Performance Prediction , 2009, TRETS.

[38] Vivek K. Pallipuram,et al. Acceleration of spiking neural networks in emerging multi-core and GPU architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[39] M.C. Smith,et al. Implementation methodology for emerging reconfigurable systems , 2008, 2008 51st Midwest Symposium on Circuits and Systems.

[40] Samuel Williams,et al. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[41] K. H. Warren,et al. PFP: a scalable parallel programming model , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[42] Mauricio Hanzich,et al. Assessing Accelerator-Based HPC Reverse Time Migration , 2011, IEEE Transactions on Parallel and Distributed Systems.

[43] R.H. Lee,et al. Methodology and Design Flow for Assisted Neural-Model Implementations in FPGAs , 2007, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[44] Tarek M. Taha,et al. FPGA Implementation of Izhikevich Spiking Neural Networks for Character Recognition , 2009, 2009 International Conference on Reconfigurable Computing and FPGAs.

[45] Ivan S Ufimtsev,et al. Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. , 2008, Journal of chemical theory and computation.

[46] Tor M. Aamodt,et al. A first-order fine-grained multithreaded throughput model , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[47] Lutz Prechelt. A parallel programming model for irregular dynamic neural networks , 1997, Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228).

[48] Gerhard Wellein,et al. Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures , 2003, Int. J. High Perform. Comput. Appl..

[49] Sanjay J. Patel,et al. Implicitly Parallel Programming Models for Thousand-Core Microprocessors , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[50] H. Wilson. Simplified dynamics of human and mammalian neocortical neurons. , 1999, Journal of theoretical biology.

[51] John Paul Shen,et al. A framework for statistical modeling of superscalar processor performance , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[52] Dimitrios S. Nikolopoulos,et al. Parallel Programming Models for Heterogeneous Multicore Architectures , 2010, IEEE Micro.

[53] Pat Hanrahan,et al. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.

[54] Stephen W. Poole,et al. Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors , 2010, J. Comput. Phys..

[55] Bernabé Linares-Barranco,et al. Memristance can explain Spike-Time-Dependent-Plasticity in Neural Synapses , 2009 .

[56] Wu-chun Feng,et al. Multi-dimensional characterization of temporal data mining on graphics processors , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[57] John R. Gilbert,et al. An empirical study of the performance and productivity of two parallel programming models , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[58] Faith Ellen,et al. The Complexity of Computation on the Parallel Random Access Machine , 1993 .

[59] RapidCT : Acceleration of 3 D Computed Tomography on GPUs , .

[60] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[61] Samuel Williams,et al. Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms , 2009, J. Parallel Distributed Comput..

[62] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.

[63] Firas Hamze,et al. A Performance Comparison of CUDA and OpenCL , 2010, ArXiv.

[64] Peter J. Bentley,et al. Hardware Implementation of a Bio-plausible Neuron Model for Evolution and Growth of Spiking Neural Networks on FPGA , 2008, 2008 NASA/ESA Conference on Adaptive Hardware and Systems.

[65] Alan D. George,et al. An analytical model for multilevel performance prediction of Multi-FPGA systems , 2011, TRETS.

[66] Anders Lansner,et al. Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[67] Samuel Williams,et al. Sparse Matrix-Vector Multiplication on Multicore and Accelerators , 2010, Scientific Computing with Multicore and Accelerators.

[68] W. Rall. Branching dendritic trees and motoneuron membrane resistivity. , 1959, Experimental neurology.

[69] Samuel Williams,et al. Auto-Tuning the 27-point Stencil for Multicore , 2009 .

[70] Witold R. Rudnicki,et al. An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[71] Martin Hopkins,et al. Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.

[72] Vivek K. Pallipuram,et al. A comparative study of GPU programming models and architectures using neural networks , 2011, The Journal of Supercomputing.

[73] G. Edelman,et al. Large-scale model of mammalian thalamocortical systems , 2008, Proceedings of the National Academy of Sciences.

[74] A. Hodgkin,et al. A quantitative description of membrane current and its application to conduction and excitation in nerve , 1952, The Journal of physiology.

[75] Lyle N. Long,et al. Character Recognition using Spiking Neural Networks , 2007, 2007 International Joint Conference on Neural Networks.

[76] A. Marowka. Towards High-Level Parallel Programming Models for Multicore Systems , 2008, 2008 Advanced Software Engineering and Its Applications.

[77] Sadaf R. Alam,et al. Impact of multicores on large-scale molecular dynamics simulations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[78] Changjian Gao,et al. Cortical Models Onto CMOL and CMOS— Architectures and Performance/Price , 2007, IEEE Transactions on Circuits and Systems I: Regular Papers.

[79] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.