Monte Carlo Simulations of Spin Glasses on Cell Broadband Engine

Several large-scale computational scientific problems require high-end computing systems to be solved. In the recent years, design of multi-core architectures delivers on a single chip tens or hundreds Gflops of peak computing performance, with high power dissipation efficiency, and it makes available computational power previously available only on high-end multi-processor systems. The aim of this Ph.D. thesis is to study the capability of multi-core processors for scientific programming, analyzing sustained performance, issues related to multicore programming, data distribution, synchronization, in order to define a set of guideline rules to optimize scientific applications for this class of architectures. As an example of multi-core processor, we consider the Cell Broadband Engine (CBE), developed by Sony, IBM and Toshiba. The CBE is one of the most powerful multi-core CPU current available, integrating eight cores and delivering a peak performance of 200 Gflops in single precision and 100 Gflops in double precision. As case of study, we analyze the performances of CBE for Monte Carlo simulations of the Edwards-Anderson Spin Glass model, a paradigm in theoretical and condensed matter physics, used to describe complex systems characterized by phase transitions (such as the para-ferro transition in magnets) or model “frustrated” dynamics. We descrive several strategies for the distribution of data set among on-chip and off-chip memories and propose analytic models to find out the balance between computational and memory access time as a function of both algorithmic and architectural parameters. We use the analytic models to set the parameters of the algorithm, like for example size of data structures and scheduling of operations, to optimize execution of Monte Carlo spin glass simulations on the CBE architecture.

[1]  Alan Gara,et al.  QCDOC: A 10 Teraflops Computer for Tightly-Coupled Calculations , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[2]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[3]  Fabrizio Petrini,et al.  Cell Multiprocessor Communication Network: Built for Speed , 2006, IEEE Micro.

[4]  Jason N. Dale,et al.  Cell Broadband Engine Architecture and its first implementation - A performance view , 2007, IBM J. Res. Dev..

[5]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[6]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[7]  C. Bachas Computer-intractability of the frustration model of a spin glass , 1984 .

[8]  William J. Dally,et al.  Multi-Core for HPC: breakthrough or breakdown? , 2006, SC.

[9]  Mohammad Zubair,et al.  A unified model for multicore architectures , 2008, IFMT '08.

[10]  Michael Scott,et al.  Accelerating SSL using the Vector processors in IBM's Cell Broadband Engine for Sony's Playstation 3 , 2007, IACR Cryptol. ePrint Arch..

[11]  Michael Gschwind,et al.  Optimizing Compiler for the CELL Processor , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[12]  John C. Spray,et al.  Lattice QCD on the Cell Processor , 2007 .

[13]  Ibm Redbooks,et al.  Programming the Cell Broadband Engine Architecture: Examples and Best Practices , 2008 .

[14]  M. Mézard,et al.  Spin Glass Theory and Beyond , 1987 .

[15]  Khaled Z. Ibrahim,et al.  Implementing Wilson-Dirac operator on the cell broadband engine , 2008, ICS '08.

[16]  Geppino Pucci,et al.  The Potential of On-Chip Multiprocessing for QCD Machines , 2005, HiPC.

[17]  Robert D. Mawhinney The 1 Teraflops QCDSP computer , 1999, Parallel Comput..

[18]  Junichiro Makino,et al.  The GRAPE project , 2006, Computing in science & engineering (Print).

[19]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[20]  F. Barahona On the computational complexity of Ising spin glass models , 1982 .

[21]  N. Eicker,et al.  QCD on the Cell Broadband Engine , 2007 .

[22]  B. Cipra The Ising Model Is NP-Complete , 2000 .

[23]  Pramod Bhatotia,et al.  Compiling Irregular Accesses for the Cell Broadband Engine ∗ , 2008 .

[24]  Michael Gschwind,et al.  Cell GC: using the cell synergistic processor as a garbage collection coprocessor , 2008, VEE '08.

[25]  Srinivasan Parthasarathy,et al.  Data mining on the cell broadband engine , 2008, ICS '08.

[26]  K. Binder,et al.  Spin glasses: Experimental facts, theoretical concepts, and open questions , 1986 .

[27]  Ashwini K. Nanda,et al.  Cell/B.E. blades: Building blocks for scalable, real-time, interactive, and digital media servers , 2007, IBM J. Res. Dev..

[28]  John Owens,et al.  Streaming architectures and technology trends , 2005, SIGGRAPH Courses.

[29]  Rémi Monasson,et al.  Determining computational complexity from characteristic ‘phase transitions’ , 1999, Nature.

[30]  Toshio Nakatani,et al.  MPI microtask for programming the Cell Broadband EngineTM processor , 2006, IBM Syst. J..

[31]  Gerard T. Barkema,et al.  Monte Carlo Methods in Statistical Physics , 1999 .

[32]  Michael Lang,et al.  Entering the petaflop era: The architecture and performance of Roadrunner , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  Daniel L. Stein Spin Glasses And Biology , 1992 .

[34]  Philip S. Yu,et al.  CellSort: High Performance Sorting on the Cell Processor , 2007, VLDB.

[35]  C. L. Ullod,et al.  SUE: A special purpose computer for spin glass models , 2001 .

[36]  John D. McCalpin,et al.  The Role of Multicore Processors in the Evolution of General-Purpose Computing , 2009 .

[37]  F. Y. Wu The Potts model , 1982 .

[38]  Denis Navarro,et al.  Simulating spin systems on IANUS, an FPGA-based computer , 2007, Comput. Phys. Commun..

[39]  Samuel Williams,et al.  Scientific Computing Kernels on the Cell Processor , 2007, International Journal of Parallel Programming.

[40]  Sorin Istrail,et al.  Statistical mechanics, three-dimensionality and NP-completeness: I. Universality of intracatability for the partition function of the Ising model across non-planar surfaces (extended abstract) , 2000, STOC '00.

[41]  David A. Bader,et al.  On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[42]  R. Baxter Exactly solved models in statistical mechanics , 1982 .

[43]  Michael Allen,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .

[44]  Paul Keir,et al.  Compile-Time and Run-Time Issues in an Auto-Parallelisation System for the Cell BE Processor , 2008, Euro-Par Workshops.

[45]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[46]  Yuan Zhao,et al.  Dependence-Based Code Generation for a CELL Processor , 2006, LCPC.

[47]  Daniel A. Brokenshire,et al.  Introduction to the Cell Broadband Engine Architecture , 2007, IBM J. Res. Dev..

[48]  Giorgio Parisi,et al.  Effects of the random number generator on computer simulations , 1985 .

[49]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[50]  J. Dongarra,et al.  The Impact of Multicore on Computational Science Software , 2007 .

[51]  Michael Gschwind,et al.  Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture , 2006, IBM Syst. J..

[52]  Frank Steiner,et al.  Statistical Physics and Economics: Concepts, Tools, and Applications , 2003 .

[53]  Raffaele Tripiccione,et al.  Computing for LQCD: apeNEXT , 2006, Computing in Science & Engineering.