A study on multi-dimensional configurable processor array in hardware and software complex architecture

High performance computing (HPC) usually solves complex science, engineering, and business problems that require huge computation capabilities. The main trend HPC solutions are implemented by supercomputers which are composed of huge amount of general purpose processors as computing nodes on a network, to meet the demand of most high performance computing applications. As the ordinary HPC systems composed of CPUs is limited by power and heat constraints, the system had to be comprised of much larger number of lower-power, lower-performance cores. The high-performance with low power consumption is required. Recently, GPGPU which composes of thousands cores is commonly used to accelerate HPC in many studies, but the actual achieved performance changes greatly for each application relative to its peak performance. In addition, data communication bottleneck among computing nodes also can be solved by through various approaches such as optical communication. FPGA (Field Programmable Gate Array) is a LSI that can implement most suitable specific processor circuit on particular applications. With the development of FPGA technology, many HPC applications can be accelerated by using FPGAs to deliver enormous performance. The configurable HPC systems which accumulated a lot of FPGAs are able to be widely utilized on HPC to implement high performance on low power consumption. We constructed a configurable processor array with multidimensional FPGA array, that named as Virtual Object by Configurable Array of Little Scalable Engine(Vocalise). The proposed system has following features: 1. The design and development is high-e fficiency and easy-to-use for various applica-

[1]  R. Courant,et al.  On the solution of nonlinear hyperbolic differential equations by finite differences , 1952 .

[2]  Fumiyoshi Shoji,et al.  The K computer: Japanese next-generation supercomputer development project , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[3]  Satoru Yamamoto,et al.  Prototype implementation of array-processor extensible over multiple FPGAs for scalable stencil computation , 2010, CARN.

[4]  Alex Ramírez,et al.  The low-power architecture approach towards exascale computing , 2011, ScalA '11.

[5]  Volodymyr V. Kindratenko,et al.  A case study in porting a production scientific supercomputing application to a reconfigurable computer , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[6]  Jiang Li,et al.  2D/3D FPGA array for brain process and numerical computation , 2012, 2012 8th International Conference on Natural Computation.

[7]  Kenji Kudo,et al.  Hardware Object Model and Its Application to the Image Processing(Video/Image Coding)( Applications and Implementations of Digital Signal Processing) , 2004 .

[8]  Toshifumi Matsuoka,et al.  Wave propagation simulation using the CIP method of characteristic equations , 2008 .

[9]  Kei Hiraki,et al.  GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[10]  Masatoshi Sekine,et al.  A note on the basis set approach in the constrained interpolation profile method , 2004 .

[11]  Kenji Takizawa,et al.  Higher-order schemes with CIP method and adaptive Soroban grid towards mesh-free scheme , 2004 .

[12]  Dennis W. Prather,et al.  FPGA-based acceleration of the 3D finite-difference time-domain method , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[13]  Joel H. Saltz,et al.  Parallelizing Molecular Dynamics Programs for Distributed Memory Machines: An Application of the Cha , 1994 .

[14]  Stephen Booth,et al.  Maxwell - a 64 FPGA Supercomputer , 2007, Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007).

[15]  Oskar Mencer,et al.  ASC: a stream compiler for computing with FPGAs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[16]  D.E. Culler,et al.  Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[17]  B. L. Buzbee,et al.  The direct solution of the discrete Poisson equation on irregular regions , 1970 .

[18]  Electromagnetic wave propagation analysis by using the CIP method and quadratic interpolation , 2009, 2009 IEEE Antennas and Propagation Society International Symposium.

[19]  Masatoshi Sekine,et al.  Hardware objects of the circuits for robotics , 2003, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694).

[20]  Martin C. Herbordt,et al.  Achieving High Performance with FPGA-Based Computing , 2007, Computer.

[21]  Yong Wang,et al.  SDA: Software-defined accelerator for large-scale DNN systems , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).

[22]  Wayne Luk,et al.  Cube: A 512-FPGA cluster , 2009, 2009 5th Southern Conference on Programmable Logic (SPL).

[23]  Laxmikant V. Kalé,et al.  Achieving strong scaling with NAMD on Blue Gene/L , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[24]  Hussein A. Abbass,et al.  The Role of Explicit Niching and Communication Messages in Distributed Evolutionary Multi-objective Optimization , 2010, Parallel and Distributed Computational Intelligence.

[25]  Tsuyoshi Matsuoka 2D Wave Propagation Characteristics of the CIP Method with Amplitude Error Compensation , 2010, 2010 International Conference on Broadband, Wireless Computing, Communication and Applications.

[26]  Chen Chang,et al.  BEE3: Revitalizing Computer Architecture Research , 2009 .

[27]  Hideharu Amano,et al.  Implementation and evaluation of an arithmetic pipeline on FLOPS-2D: multi-FPGA system , 2011, CARN.

[28]  H. Tamukoh,et al.  Internet booster: A networked HW/SW complex system and its application to HI-performance web application , 2010, 2010 World Automation Congress.