Unified algebraic computations on cyclic permutation networks
暂无分享,去创建一个
This dissertation introduces a new concept by which it is possible to design and implement arithmetic processors using cyclic permutation networks. Unlike conventional arithmetic processors, this new concept leads to the construction of arithmetic processors, where addition, subtraction and multiplication can all be unified into a single operation of composing permutations. The concept is based on coding numbers into permutation maps, and then carrying out the requisite computations by composing permutations on a network of switches. The resulting permutations are then converted back into sums and products. The dissertation materializes this concept by a cyclic permutation network and shows that both addition and multiplication modulo M can be carried out on a lg$\sp2$ M/lg lg M-input permutation network with O(lg$\sp2$ M) cost and O(lg lg M) delay$\sp1$. Furthermore, when used in a residue number system setting, RNS, the conversion between binary and residue number systems can also be carried out on such a network with O(lg$\sp2$ M) cost and in O(lg lg M) delay from binary to RNS and O(lg$\sp2$ M lg lg M) cost and O(lg$\sp2$ lg M) delay from RNS to binary. When expressed in bit level, where M is an n-bit number, these complexities translate to $O(n\sp2)$ cost and O(lg n) delay for addition, multiplication and conversion from binary to RNS. The conversion from RNS to binary exacts $O(n\sp2$ lg n) cost and O(lg$\sp2$ n) delay. These complexities are either competitive or compare favorably with the complexities of previously reported arithmetic circuits for RNS computations.
We extend the above concept and incorporate it into a reconfigurable bus system to design inner product processors. This is done under the assumption that once the switch state of each processor is set in a given connection, the signals can be broadcast in constant time. Based on this assumption, we propose a cyclic permutation network arithmetic processor which can compute the real inner product, complex inner product, and matrix multiplication all in O(1) time at the cost of O(N), O(N), and O($N\sp3$), processors, respectively, where the basic steps of addition and multiplication each take one unit time and one unit cost, N is the number of components of each vector and the sizes of the matrices are $N \times N$.
Finally, we propose a new approach which can concurrently detect the errors in a cyclic permutation network modulo arithmetic processor by combining r-out-of-s residue codes and Berger codes. This fault-tolerant arithmetic processor can detect any number of faulty modules without any redundant moduli. In addition, it can tolerate L faults if L redundant moduli are used, and also has the property of graceful degradation when the number of faulty modules exceeds L. ftn$\sp1$lg m denotes $\log\sb2 m$.