In this paper, a digit-serial systolic array is proposed for computing modular multiplication A(x), B(x) mod G(x) in finite fields GF(2) with the standard basis representation. From the multiplication algorithm in GF(2 ), we obtain a new dependence graph and design an efficient digit-serial systolic multiplier. If input data come in continuously, the proposed array can produce multiplication results at a rate of one every [m/L] clock cycles, where L is the selected digit size. The analysis results show that the proposed architecture leads to a reduction of computational delay time and it has much more simple structure than existing digit-serial systolic multiplier. Furthermore, since the new architecture has the features of regularity, modularity, and unidirectional data flow, it is well suited to VLSI implementation with fault-tolerant design. Key-Words : Digit-Serial Architecture, Finite Field Multiplier, Cryptography, Systolic Array, Finite Field Arithmetic, VLSI. 1 Introduction Finite or Galois Field(GF)(2) have played an important role in many application areas of communication, such as error-correcting code [1] and cryptography [9]. Because addition in GF(2) is bit independent XOR operation, it can be implemented in fast and inexpensive way. On the other hand, multiplication is more complicated and expensive. Futhermore, it is the most common arithmetic operation in GF(2). Thus, it is desirable to design hardware-efficient multiplier for GF(2) to meet the real-time requirement with minimum hardware complexity [3]. Many approaches and architectures have been proposed to perform multiplication in GF(2) [2][3][4][5][14]. The multipliers for GF(2) can be classified into four types: bit-parallel, bit-serial, super-serial and digit-serial architectures. Basically a bit-parallel system reaches much better throughput performance than the others, but it involves much more hardware complexity. To improve the trade-off between throughput performance and hardware complexity, digit-serial architecture have been proposed [3][7][8]. For a digit-serial system, the data words are first partitioned
This work was supported by grant No. 2000-2-51200-001-2 from the Basic Research Program of the Korea Science & Engineering Foundation into digits of some bits each, and then processed and transmitted on a digit-by-digit basis. Suppose that the word size is m-bit, the digit size is L-bit, and N = [m/L], then bit-parallel, bit-serial and super-serial systems process the input data at a rate of m-bit, one-bit and less than one-bit per clock cycle respectively, while digit-serial system processes the input data at a rate of L-bit per clock cycle. In other words, digit-serial system will yield output results at a rate of one every N clock cycles, while bit-parallel, bit-serial and super-serial systems will yield output results at a rate of one, m and more than m clock cycles, respectively. If the digit size is chosen appropriately, a digit-serial architecture can meet the throughput requirement of a certain application with minimum hardware. In this paper, we first review the multiplication algorithm in GF(2) and then derive a new dependence graph(DG). Based on a new DG, we propose an efficient digit-serial systolic multiplier and its architecture leads to a reduction of computational delay time and it shows more simple structure compared to Guo and Wang’s architecture [3]. In addition, if the input data come in continuously, the proposed array can produce results at a rate of one per N cycles after an initial delay of 3N clock cycles, which is the same as the previous research [3]. Finally, since the new architecture has the features of regularity, modularity, and unidirectional data flow, it is well suited to VLSI implementation with fault-tolerant design. 2 The Multiplication Algorithm In this section, we first review the multiplication algorithm [2][5]. Let A(x) and B(x) be two elements in GF(2), G(x) be the primitive polynomial used to generate the field and P(x) be the result of the product A(x)B(x) mod G(x). For each polynomials, the coefficients are the binary digits 0 and 1. A(x)=am-1x +am-2x + ... +a1x +a0 B(x)=bm-1x +bm-2x + ... +b1x +b0 G(x)= x+gm-1 x +gm-2 x +...+g1x+g0 P(x)= pm-1x +pm-2x + ... +p1x +p0 In equation (1), each element is a residue mod G(x), and all arithmetic operations are performed by taking the results modulo 2. As described in [2], P(x) can be computed recursively as follows. T0(x) = 0 Ti(x) = [Ti-1(x)x]mod G(x) + A(x)bm-i , where i = 1,2,...,m P(x) = Tm(x) After m iteration, the result P(x) can be obtained. Defining Ti(x) = ti,m-1x m-1 +ti,m-2x +· · · +ti,1x+ti,0 and substituting it into eqation 2, it can be derived that ti,k=ti-1,m-1gk+bm-iak+ti-1,k-1 for 0≤k≤m-1 with ti-1,-1 = 0. Based on the given algorithm, a DG can be derived as shown in Fig. 1. Generally, the DG consists of m × m basic cells for multiplication in GF(2). In particular, m=9 in the DG of Fig. 1. In addition, Fig. 2 represents the architecture of basic cell which consisits of two 2-input AND gates and one 3-input XOR gate. The cells in the ith low of the array perform the operations of the ith iteration, where each basic cell computes one coefficient. The coefficients of the result P(x) emerge from the bottom low of the array after m iterations. 3 A Digit-Serial Systolic Multiplier Let L be the digit size, N=m/L be an integer, and Ais, Bis, Gis, and Pis (0≤ i ≤N-1) be the digits of the coefficients of A(x), B(x), G(x) and P(x) respectively. Each digit consists of L bits such as the digit Ai= (aiL+L-1, aiL +L-2, ... ,aiL+1, aiL), and the digits Bi, Gi and Pi are defined similarly. 3.1 Modification of DG and Basic Cell As the first step for construction of a new systolic array, we combine L × L basic cells in Fig. 1 into a new basic cell. Fig. 3 shows the modified DG, where L=3 and N=m/L=3, and Fig. 4 represents the modified circuits of corresponding basic cell. As described in Fig. 3, the modified DG consist of N × N basic cells. In the modified DG of Fig. 3, the digits Ai, Gi enter the (1,i)th basic cell, the Bi enters the (N-i,N-1)th basic cell, and the digit Pi emerges from the (N,i)th basic cell, where 0≤ i ≤N-1. Although the modified DG in Fig. 3 has high regularity, it is impossible to get an one-dimensional signal flow graph (SFG) array. This is because the data flow is bi-directional horizontally in Fig. 3, the DG cannot be projected along the east direction. In other words, the (i,k)th basic cell has to get the (L-1) temporary results tiL-1,kL-1, tiL-2,kL-1,...., and tiL-(L-1),kL-1 from the right neighboring (i,k-1)th basic cell. (9,5) (1,8) (1,7) (1,0) (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (2,8) (2,7) (2,0) (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (3,8) (3,7) (3,0) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (4,8) (4,7) (4,0) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (5,8) (5,7) (5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (6,8) (6,7) (6,0) (6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (7,8) (7,7) (7,0) (7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (8,8) (8,7) (8,0) (8,1) (8,2) (8,3) (8,4) (8,5) (8,6) (9,8) (9,7) (9,0) (9,1) (9,2) (9,3) (9,4) (9,6) b8
[1]
Chin-Liang Wang,et al.
Systolic array implementation of multipliers for finite fields GF(2/sup m/)
,
1991
.
[2]
Peter J. Ashenden,et al.
The Designer's Guide to VHDL
,
1995
.
[3]
C.-L. Wang,et al.
Digit-serial systolic multiplier for finite fields GF(2m)
,
1998
.
[4]
Keshab K. Parhi.
A systematic approach for design of digit-serial signal processing architectures
,
1991
.
[5]
Keshab K. Parhi,et al.
Efficient semisystolic architectures for finite-field arithmetic
,
1998,
IEEE Trans. Very Large Scale Integr. Syst..
[6]
Weng Fook Lee.
VHDL Coding and Logic Synthesis with Synopsys
,
2000
.
[7]
Sun-Yuan Kung,et al.
On supercomputing with systolic/wavefront array processors
,
1984
.
[8]
Kamran Eshraghian,et al.
Principles of CMOS VLSI Design: A Systems Perspective
,
1985
.
[9]
Christof Paar,et al.
A super-serial Galois fields multiplier for FPGAs and its application to public-key algorithms
,
1999,
Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).
[10]
염흥렬,et al.
[서평]「Applied Cryptography」
,
1997
.
[11]
Keshab K. Parhi,et al.
Efficient finite field serial/parallel multiplication
,
1996,
Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.
[12]
R. Blahut.
Theory and practice of error control codes
,
1983
.
[13]
Peter F. Corbett,et al.
Digit-serial processing techniques
,
1990
.