Topical perspective on massive threading and parallelism.

Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future.

[1]  David A. Bader,et al.  SNAP, Small-world Network Analysis and Partitioning: An open-source parallel graph framework for the exploration of large-scale networks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[2]  Tomas Svensson,et al.  Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration. , 2008, Journal of biomedical optics.

[3]  Peter J. Stuckey,et al.  Fast and accurate protein substructure searching with simulated annealing and GPUs , 2010, BMC Bioinformatics.

[4]  Bertil Schmidt,et al.  Bioinformatics: High Performance Parallel Computer Architectures , 2010 .

[5]  David A. Bader,et al.  Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[6]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[7]  Wen-mei W. Hwu,et al.  MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs , 2008, LCPC.

[8]  M. Januszewski,et al.  Accelerating numerical solution of stochastic differential equations with CUDA , 2009, Comput. Phys. Commun..

[9]  Sadaf R. Alam,et al.  Towards microsecond biological molecular dynamics simulations on hybrid processors , 2010, 2010 International Conference on High Performance Computing & Simulation.

[10]  Jean-François Méhaut,et al.  Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures. , 2009, The Journal of chemical physics.

[11]  Klaus Schulten,et al.  GPU acceleration of cutoff pair potentials for molecular modeling applications , 2008, CF '08.

[12]  Dionisios G. Vlachos,et al.  Parallelization of tau-leap coarse-grained Monte Carlo simulations on GPUs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[13]  Todd J. Martinez,et al.  Graphical Processing Units for Quantum Chemistry , 2008, Computing in Science & Engineering.

[14]  Klaus Schulten,et al.  GPU-accelerated molecular modeling coming of age. , 2010, Journal of molecular graphics & modelling.

[15]  Vijay S. Pande,et al.  SIML: A Fast SIMD Algorithm for Calculating LINGO Chemical Similarities on GPUs and CPUs , 2010, J. Chem. Inf. Model..

[16]  Lorenzo Dematté,et al.  GPU computing for systems biology , 2010, Briefings Bioinform..

[17]  Hua Zhou,et al.  Graphics Processing Units and High-Dimensional Optimization. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[18]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation. , 2009, Journal of chemical theory and computation.

[19]  Wen-mei W. Hwu,et al.  GPU computing gems , 2011 .

[20]  Akila Gothandaraman,et al.  Comparing Hardware Accelerators in Scientific Applications: A Case Study , 2011, IEEE Transactions on Parallel and Distributed Systems.

[21]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. , 2008, Journal of chemical theory and computation.

[22]  Vijay S. Pande,et al.  Efficient nonbonded interactions for molecular dynamics on a graphics processing unit , 2010, J. Comput. Chem..

[23]  Mike Murphy,et al.  Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs , 2010, CGO '10.

[24]  Zsófia Szalay,et al.  Fast calculation of DNMR spectra on CUDA‐enabled graphics card , 2011, J. Comput. Chem..

[25]  Qing Nie,et al.  Integrative multicellular biological modeling: a case study of 3D epidermal development using GPU algorithms , 2010, BMC Systems Biology.

[26]  Asim Munawar,et al.  A Bayesian Optimization Algorithm for De Novo ligand design based docking running over GPU , 2010, IEEE Congress on Evolutionary Computation.

[27]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[28]  S. Bianchi,et al.  Real-time optical micro-manipulation using optimized holograms generated on the GPU , 2009, Comput. Phys. Commun..

[29]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[30]  John E. Stone,et al.  Long time-scale simulations of in vivo diffusion using GPU hardware , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[31]  David A. Bader,et al.  Massive Social Network Analysis: Mining Twitter for Social Good , 2010, 2010 39th International Conference on Parallel Processing.

[32]  Roger D. Chamberlain,et al.  Accelerating HMMER on GPUs by implementing hybrid data and task parallelism , 2010, BCB '10.

[33]  David A Boas,et al.  Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units. , 2009, Optics express.

[34]  Christian Windischberger,et al.  Toward discovery science of human brain function , 2010, Proceedings of the National Academy of Sciences.

[35]  Philip Saponaro,et al.  Improving numerical reproducibility and stability in large-scale numerical simulations on GPUs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[36]  Joshua A. Anderson,et al.  General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..

[37]  Ju Lu,et al.  Semi-Automated Reconstruction of Neural Processes from Large Numbers of Fluorescence Images , 2009, PloS one.

[38]  Cliburn Chan,et al.  Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[39]  Koji Yasuda,et al.  Accelerating Density Functional Calculations with Graphics Processing Unit. , 2008, Journal of chemical theory and computation.

[40]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics. , 2009, Journal of chemical theory and computation.

[41]  Laxmikant V. Kale,et al.  NAMD2: Greater Scalability for Parallel Molecular Dynamics , 1999 .

[42]  Keir Fraser,et al.  Concurrent programming without locks , 2007, TOCS.

[43]  Robert H. Dennard,et al.  A 30 Year Retrospective on Dennard's MOSFET Scaling Paper , 2007 .

[44]  Klaus Schulten,et al.  Immersive Molecular Visualization and Interactive Modeling with Commodity Hardware , 2010, ISVC.

[45]  Markus Hadwiger,et al.  Ssecrett and NeuroTrace: Interactive Visualization and Analysis Tools for Large-Scale Neuroscience Data Sets , 2010, IEEE Computer Graphics and Applications.

[46]  Wu-chun Feng,et al.  Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units. , 2010, Journal of molecular graphics & modelling.