Performance analysis and fitness of GPGPU and multicore architectures for scientific applications

[1]  Sabine Pruggnaller,et al.  Performance evaluation of image processing algorithms on the GPU. , 2008, Journal of structural biology.

[2]  Dharmendra S. Modha,et al.  The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[3]  Tien-Tsin Wong,et al.  Evolutionary Computing on Consumer Graphics Hardware , 2007, IEEE Intelligent Systems.

[4]  L. Fortuna,et al.  Neuronal dynamics on FPGA: Izhikevich's model , 2005, SPIE Microtechnologies.

[5]  Eugene M. Izhikevich,et al.  Which model to use for cortical spiking neurons? , 2004, IEEE Transactions on Neural Networks.

[6]  Uzi Vishkin,et al.  A pilot study to compare programming effort for two parallel programming models , 2007, J. Syst. Softw..

[7]  David Pellerin,et al.  Practical FPGA programming in C , 2005 .

[8]  Nikil D. Dutt,et al.  A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors , 2009, Neural Networks.

[9]  Robert A. van de Geijn,et al.  SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[10]  Megan Vance A Migration-Based Parallel Programming Model with Architectural Support Structures , 2009, 2009 DoD High Performance Computing Modernization Program Users Group Conference.

[11]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[12]  Jack J. Dongarra,et al.  From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..

[13]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[14]  H. Markram The Blue Brain Project , 2006, Nature Reviews Neuroscience.

[15]  Paul-Jean Cagnard,et al.  The parallel cellular programming model , 2000, Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing.

[16]  Xiaofeng Gao,et al.  Performance Sensitivity Studies for Strategic Applications , 2005, 2005 Users Group Conference (DOD-UGC'05).

[17]  Gregory D. Peterson,et al.  Analytical modeling of high performance reconfigurable computers: prediction and analysis of system performance , 2003 .

[18]  Tarek M. Taha,et al.  Character recognition with two spiking neural network models on multicore architectures , 2009, 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing.

[19]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation. , 2009, Journal of chemical theory and computation.

[20]  Gaurav Khanna,et al.  Numerical modeling of gravitational wave sources accelerated by OpenCL , 2010, Comput. Phys. Commun..

[21]  Patrick Horain,et al.  GpuCV: A GPU-Accelerated Framework for Image Processing and Computer Vision , 2008, ISVC.

[22]  Andres Upegui,et al.  A Hardware Implementation of a Network of Functional Spiking Neurons with Hebbian Learning , 2004, BioADIT.

[23]  Eugene M. Izhikevich,et al.  Polychronization: Computation with Spikes , 2006, Neural Computation.

[24]  Sayantan Sur,et al.  Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms , 2007 .

[25]  W. Mendenhall,et al.  A Second Course in Statistics: Regression Analysis , 1996 .

[26]  Luis A. Plana,et al.  SpiNNaker: Mapping neural networks onto a massively-parallel chip multiprocessor , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[27]  Yihan Shao,et al.  Accelerating resolution-of-the-identity second-order Møller-Plesset quantum chemistry calculations with graphical processing units. , 2008, The journal of physical chemistry. A.

[28]  Laura Carrington,et al.  Modeling application performance by convolving machine signatures with application profiles , 2001 .

[29]  John A. Keane,et al.  Comparing distributed memory and virtual shared memory parallel programming models , 1995, Future Gener. Comput. Syst..

[30]  Ivan Viola,et al.  Two-Level Approach to Efficient Visualization of Protein Dynamics , 2007, IEEE Transactions on Visualization and Computer Graphics.

[31]  C. Morris,et al.  Voltage oscillations in the barnacle giant muscle fiber. , 1981, Biophysical journal.

[32]  Samuel Williams,et al.  Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .

[33]  Jesús Labarta,et al.  A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[34]  Pallipuram Krishnamani,et al.  Acceleration of spiking neural networks on single-GPU and multi-GPU systems , 2010 .

[35]  Junfeng Wu,et al.  Parallel Phase Model: A Programming Model for High-end Parallel Machines with Manycores , 2009, 2009 International Conference on Parallel Processing.

[36]  M. L. Sawley,et al.  A comparison of parallel programming models for multiblock flow computations , 1995 .

[37]  Alan D. George,et al.  RAT: RC Amenability Test for Rapid Performance Prediction , 2009, TRETS.

[38]  Vivek K. Pallipuram,et al.  Acceleration of spiking neural networks in emerging multi-core and GPU architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[39]  M.C. Smith,et al.  Implementation methodology for emerging reconfigurable systems , 2008, 2008 51st Midwest Symposium on Circuits and Systems.

[40]  Samuel Williams,et al.  Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[41]  K. H. Warren,et al.  PFP: a scalable parallel programming model , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[42]  Mauricio Hanzich,et al.  Assessing Accelerator-Based HPC Reverse Time Migration , 2011, IEEE Transactions on Parallel and Distributed Systems.

[43]  R.H. Lee,et al.  Methodology and Design Flow for Assisted Neural-Model Implementations in FPGAs , 2007, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[44]  Tarek M. Taha,et al.  FPGA Implementation of Izhikevich Spiking Neural Networks for Character Recognition , 2009, 2009 International Conference on Reconfigurable Computing and FPGAs.

[45]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. , 2008, Journal of chemical theory and computation.

[46]  Tor M. Aamodt,et al.  A first-order fine-grained multithreaded throughput model , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[47]  Lutz Prechelt A parallel programming model for irregular dynamic neural networks , 1997, Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228).

[48]  Gerhard Wellein,et al.  Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures , 2003, Int. J. High Perform. Comput. Appl..

[49]  Sanjay J. Patel,et al.  Implicitly Parallel Programming Models for Thousand-Core Microprocessors , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[50]  H. Wilson Simplified dynamics of human and mammalian neocortical neurons. , 1999, Journal of theoretical biology.

[51]  John Paul Shen,et al.  A framework for statistical modeling of superscalar processor performance , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[52]  Dimitrios S. Nikolopoulos,et al.  Parallel Programming Models for Heterogeneous Multicore Architectures , 2010, IEEE Micro.

[53]  Pat Hanrahan,et al.  Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.

[54]  Stephen W. Poole,et al.  Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors , 2010, J. Comput. Phys..

[55]  Bernabé Linares-Barranco,et al.  Memristance can explain Spike-Time-Dependent-Plasticity in Neural Synapses , 2009 .

[56]  Wu-chun Feng,et al.  Multi-dimensional characterization of temporal data mining on graphics processors , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[57]  John R. Gilbert,et al.  An empirical study of the performance and productivity of two parallel programming models , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[58]  Faith Ellen,et al.  The Complexity of Computation on the Parallel Random Access Machine , 1993 .

[59]  RapidCT : Acceleration of 3 D Computed Tomography on GPUs , .

[60]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[61]  Samuel Williams,et al.  Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms , 2009, J. Parallel Distributed Comput..

[62]  Robert A. van de Geijn,et al.  Anatomy of high-performance matrix multiplication , 2008, TOMS.

[63]  Firas Hamze,et al.  A Performance Comparison of CUDA and OpenCL , 2010, ArXiv.

[64]  Peter J. Bentley,et al.  Hardware Implementation of a Bio-plausible Neuron Model for Evolution and Growth of Spiking Neural Networks on FPGA , 2008, 2008 NASA/ESA Conference on Adaptive Hardware and Systems.

[65]  Alan D. George,et al.  An analytical model for multilevel performance prediction of Multi-FPGA systems , 2011, TRETS.

[66]  Anders Lansner,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[67]  Samuel Williams,et al.  Sparse Matrix-Vector Multiplication on Multicore and Accelerators , 2010, Scientific Computing with Multicore and Accelerators.

[68]  W. Rall Branching dendritic trees and motoneuron membrane resistivity. , 1959, Experimental neurology.

[69]  Samuel Williams,et al.  Auto-Tuning the 27-point Stencil for Multicore , 2009 .

[70]  Witold R. Rudnicki,et al.  An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[71]  Martin Hopkins,et al.  Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.

[72]  Vivek K. Pallipuram,et al.  A comparative study of GPU programming models and architectures using neural networks , 2011, The Journal of Supercomputing.

[73]  G. Edelman,et al.  Large-scale model of mammalian thalamocortical systems , 2008, Proceedings of the National Academy of Sciences.

[74]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1952, The Journal of physiology.

[75]  Lyle N. Long,et al.  Character Recognition using Spiking Neural Networks , 2007, 2007 International Joint Conference on Neural Networks.

[76]  A. Marowka Towards High-Level Parallel Programming Models for Multicore Systems , 2008, 2008 Advanced Software Engineering and Its Applications.

[77]  Sadaf R. Alam,et al.  Impact of multicores on large-scale molecular dynamics simulations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[78]  Changjian Gao,et al.  Cortical Models Onto CMOL and CMOS— Architectures and Performance/Price , 2007, IEEE Transactions on Circuits and Systems I: Regular Papers.

[79]  William Gropp,et al.  An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.