An Acceleration Processor for Data Intensive Scientific Computing

Scientific computations for diffusion equations and ANNs (Artificial Neural Networks) are data intensive tasks accompanied by heavy memory access; on the other hand, their computational complexities are relatively low. Thus, this type of tasks naturally maps onto SIMD (Single Instruction Multiple Data stream) parallel processing with distributed memory. This paper proposes a high performance acceleration processor of which architecture is optimized for scientific computing using diffusion equations and ANNs. The proposed architecture includes a customized instruction set and specific hardware resources which consist of a control unit (CU), 16 processing units (PUs), and a non-linear function unit (NFU) on chip. They are effectively connected with dedicated ring and global bus structure. Each PU is equipped with an address modifier (AM) and 16bit 1.5 k-word local memory (LM). The proposed processor can be easily expanded by multi-chip expansion mode to accommodate to a large scale parallel computation. The prototype chip is implemented with FPGA. The total gate count is about 1 million with 530, 432-bit embedded memory cells and it operates at 15 MHz. The functionality and performance of the proposed processor is verified with simulation of oil reservoir problem using diffusion equations and character recognition application using ANNs. The execution times of two applications are compared with software realizations on 1.7 GHz Pentium IV personal computer. Though the proposed processor architecture and the instruction set are optimized for diffusion equations and ANNs, it provides flexibility to program for many other scientific computation algorithms. key words: SIMD, FPGA, artificial neural networks, diffusion equations, image processing

[1]  Mona E. Zaghloul,et al.  VLSI implementation of locally connected neural network for solving partial differential equations , 1996 .

[2]  M. Oskin,et al.  Active Pages: a computation model for intelligent memory , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[3]  D. G. Elliott,et al.  Minimizing the effect of the host bus on the performance of a computational RAM logic-in-memory parallel-processing system , 1999, Proceedings of the IEEE 1999 Custom Integrated Circuits Conference (Cat. No.99CH36327).

[4]  Jeff A. Bilmes,et al.  The Ring Array Processor: A Multiprocessing Peripheral for Connection Applications , 1992, J. Parallel Distributed Comput..

[5]  Rainer Stotzka,et al.  Neural chip SAND/1 for real time pattern recognition , 1998 .

[6]  Giovanni Danese,et al.  A parallel neurochip for neural networks implementing the reactive tabu search algorithm: application case studies , 2001, Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing.

[7]  Sajjan G. Shiva Pipelined and parallel computer architectures , 1996 .

[8]  Shin'ichiro Okazaki,et al.  A compact real-time vision system using integrated memory array processor architecture , 1995, IEEE Trans. Circuits Syst. Video Technol..

[9]  M. Koyanagi,et al.  Multi-chip module with optical interconnection for parallel processor system , 1998, 1998 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, ISSCC. First Edition (Cat. No.98CH36156).

[10]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[11]  Joarder Kamruzzaman,et al.  Comparison of feed-forward neural net algorithms in application to character recognition , 2001, Proceedings of IEEE Region 10 International Conference on Electrical and Electronic Technology. TENCON 2001 (Cat. No.01CH37239).

[12]  Pierre Boulet,et al.  Experimental evaluation of affine schedules for matrix multiplication on the MasPar architecture , 1994, Proceedings of the First International Conference on Massively Parallel Computing Systems (MPCS) The Challenges of General-Purpose and Special-Purpose Computing.

[13]  Kazuo Kyuma,et al.  A 1.2 GFLOPS neural network chip for high-speed neural network servers , 1996, IEEE J. Solid State Circuits.

[14]  Montse Pardàs,et al.  Morphological operators for image and video compression , 1996, IEEE Trans. Image Process..

[15]  Howard B. Demuth,et al.  Modeling neural networks on the MPP , 1988, Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation.

[16]  Anton Gunzinger,et al.  Achieving super computer performance with a DSP array processor , 1992, Proceedings Supercomputing '92.

[17]  W. Daniel Hillis,et al.  The Connection Machine model CM-1 architecture , 1989, IEEE Trans. Syst. Man Cybern..