Generated Horizontal and Vertical Data Parallel GCA Machines for the N-Body Force Calculation

The GCA model (Global Cellular Automata) is a massively parallel computation model which is a generalization of the Cellular Automata model. A GCA cell contains data and link information. Using the link information each cell has dynamic read access to any global cell in the field. The data and link information is updated in every generation. The GCA model is applicable and efficient for a large range of parallel algorithms (sorting, vector reduction, graph algorithms, matrix computations etc.). In order to describe algorithms for the GCA model the experimental language GCAL was developed. GCAL programs can be transformed automatically into a data parallel architecture (DPA). The paper presents for the N-body problem how the force calculation between the masses can be described in GCAL and synthesized into a data parallel architecture. At first the GCAL description of the application is transformed into a Verilog description which is inserted into a Verilog template describing the general DPA. Then the whole Verilog code is used as input for an FPGA synthesizing tool which generates the application-specific DPA. Two different DPAs are generated, a "horizontal " and a "vertical " DPA. The horizontal DPA uses 17 floating-point operators in each deep pipeline. In contrast the "vertical" DPA uses only one floating-point operation at a time out of a set of 6 floating-point operators. Both architectures are compared to resource consumption, time per cell operation and cost (logic elements * execution time). It turned out that the horizontal DPA is approximately 15 times more cost efficient than the vertical DPA.

[1]  Reinhard Männer,et al.  Using floating-point arithmetic on FPGAs to accelerate scientific N-Body simulations , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[2]  Rolf Hoffmann,et al.  A scalable configurable architecture for the massively parallel GCA model , 2008, IPDPS.

[3]  Wolfgang Heenes,et al.  FPGA implementations of the massively parallel GCA model , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[4]  John von Neumann,et al.  Theory Of Self Reproducing Automata , 1967 .

[5]  Konrad Zuse,et al.  Rechnender Raum , 1991, Physik und Informatik.

[6]  Rolf Hoffmann,et al.  Implementing Hirschberg's PRAM-Algorithm for Connected Components on a Global Cellular Automaton , 2008, Int. J. Found. Comput. Sci..

[7]  A. A. Petrovsky,et al.  Multi-pipeline implementations of real-time vector DFT , 2004 .

[8]  J.‐H. Scharf,et al.  K. Zuse, Rechnender Raum (Schriften zur Datenverarbeitung, Band 1). VIII + 70 S. m. 74 Abb. Braunschweig 1969. Friedr. Vieweg & Sohn. Preis brosch. DM 16,80 , 1971 .

[9]  Tomoyoshi Ito,et al.  GRAPE: a special-purpose computer for N-body problems , 1990, [1990] Proceedings of the International Conference on Application Specific Array Processors.

[10]  Wolfgang Heenes,et al.  A multiprocessor architecture for the massively parallel model GCA , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[11]  Christos A. Papachristou,et al.  An FPGA-based computing platform for real-time 3D medical imaging and its application to cone-beam CT reconstruction , 2005 .

[12]  Hui Guo,et al.  Customization of application specific heterogeneous multi-pipeline processors , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[13]  Hubert Nguyen,et al.  GPU Gems 3 , 2007 .

[14]  Stefan Waldschmidt,et al.  GCA: Global Cellular Automata. A Flexible Parallel Model , 2001, PaCT.