Techniques for algorithm design on the instruction systolic array

Instruction systolic arrays (ISAs) provide a programmable high performance hardware for specific computationally intensive applications. Typically, such an array is connected to a sequential host, thus operating like a coprocessor which solves only the computationally intensive tasks within a global application. The ISA model is a mesh connected processor grid, which combines the advantages of special purpose systolic arrays with the flexible programmability of general purpose machines. The subject of this thesis is the analysis, design, and implementation of several special purpose algorithms and subroutines on the ISA that take advantage of the special features of the systolic information flow. The ability of ISAs to perform parallel prefix computations in an extremely efficient way is exploited as a key-operation to derive efficiency as well as local operations within each processor. Therefore, given sequential algorithms has to be decomposed in simple building blocks of parallel prefix computations and parallel local operations. To modify sequential algorithms for a parallelisation several techniques are introduced in this thesis, e. g. swapping of loops in the sequential algorithm, shearing of data, and appropriate mapping of input data onto the processor array It is demonstrated how these techniques can be exploited to derive efficient ISA algorithms for several computationally intensive applications. These include cryptographic applications (e. g. arithmetic operations on long operands, RSA encryption, RSA key generation) and image processing applications (e. g. convolution, Wavelet Transform, morphological operators, median filter, Fourier Transform, Hough Transform, Morphological Hough Transform, and tomographic image reconstruction). Their implementation on Systola 1024 - the first commercial parallel computer with the ISA architecture - shows that the concept of the ISA is very suitable for these applications and results in significant run time savings. The results of this thesis emphases the suitability of the ISA concept as an accelerator for computationally intensive applications in the areas of cryptography and image processing. This might lead research towards further high-speed low cost systems based on ISA hardware.

[1]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[2]  Manfred Kunde,et al.  The instruction systolic array and its relation to other models of parallel computers , 1988, Parallel Comput..

[3]  Moon Ho Lee On computing 2-D systolic algorithm for discrete cosine transform , 1990 .

[4]  P.J. Hurst,et al.  High-speed computation of the Radon transform and backprojection using an expandable multiprocessor architecture , 1992, IEEE Trans. Circuits Syst. Video Technol..

[5]  H. T. Kung Why systolic architectures? , 1982, Computer.

[6]  Joos Vandewalle,et al.  Comparison of Three Modular Reduction Functions , 1993, CRYPTO.

[7]  Philip K. Robertson Fast Perspective Views of Images Using One-Dimensional Operations , 1987, IEEE Computer Graphics and Applications.

[8]  B. Beresford-Smith,et al.  A parallel morphological implementation of the Hough transform , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[9]  Patrick M. Lenders,et al.  A programmable systolic device for image processing based on mathematical morphology , 1990, Parallel Comput..

[10]  Hsueh-Ming Hang,et al.  A comparison of block-matching algorithms mapped to systolic-array implementation , 1997, IEEE Trans. Circuits Syst. Video Technol..

[11]  Dirk Fox,et al.  Der "Digital Signature Standard": Aufwand, Implementierung und Sicherheit , 1993, VIS.

[12]  S. Rowland,et al.  Computer implementation of image reconstruction formulas , 1979 .

[13]  Sanjeev Saxena,et al.  On Parallel Prefix Computation , 1994, Parallel Process. Lett..

[14]  P. Toft The Radon Transform - Theory and Implementation , 1996 .

[15]  Marco Ferretti,et al.  Architectures for the Hough Transform: A Survey , 1996, MVA.

[16]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[17]  Ronald N. Bracewell,et al.  The Fourier Transform and Its Applications , 1966 .

[18]  Bernard P. Zajac Applied cryptography: Protocols, algorithms, and source code in C , 1994 .

[19]  Patrick M. Lenders,et al.  Microprogramming instruction systolic arrays , 1989, MICRO 22.

[20]  Hans-Werner Lang The instruction systolic array - a parallel architecture for VLSI , 1986, Integr..

[21]  H. T. Kung,et al.  Systolic Arrays for (VLSI). , 1978 .

[22]  T.H. Lee,et al.  A 600 MHz superscalar RISC microprocessor with out-of-order execution , 1997, 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers.

[23]  Heiko Schröder,et al.  A simple systolic method to find all bridges of an undirected graph , 1989, Parallel Comput..

[24]  Bertil Schmidt,et al.  Morphological Hough Transform on the Instruction Systolic Array , 1997, Euro-Par.

[25]  Joseph JáJá,et al.  Efficient Image Processing Algorithms on the Scan Line Array Processor , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Ç. Koç Analysis of sliding window techniques for exponentiation , 1995 .

[27]  Jean Serra,et al.  Image Analysis and Mathematical Morphology , 1983 .