A dynamic arithmetic architecture: precision, power and performance considerations

Reconfigurable logic devices are widely used within the scientific computing community as hardware accelerators. This is due to their ability to implement parallel structures and to provide performance comparable to hardware-based designs, while maintaining flexibility similar to software-based designs. New features incorporated on current devices such as tightly coupled embedded processors, and run-time partial reconfiguration support offer a wide range of new possible applications. Run-time partial reconfiguration allows a system hosted in one of these devices to change—reconfigure itself—at will. Currently, the main consequence of this feature is the possibility of multiplexing functional units in time, effectively allowing a design to fit more logic than what the device physically provides. In this dissertation, run time partial reconfiguration is used to implement a dynamic precision arithmetic architecture. This solution allows a system to change a functional unit's arithmetic operation, its precision or both, at will. Unlike traditional run-time reconfiguration solutions, the exchangeable functional units in this solution are much smaller, reducing the impact of reconfiguration time overhead on the overall system's performance. The architecture presented in this dissertation is fully scalable, in performance, precision and power consumption. Performance is scaled by incrementing the number of coprocessor units. Precision can be scaled by dynamically changing the functional units that control the numerical format used in the arithmetic operations. Power is scaled by shutting off coprocessors using run-time partial reconfiguration. All three factors are combined to formulate a model extensible to different configurations and future devices/technology. The architecture is ported to the Virtex 2 Pro and Virtex 4 families in order to compare how the families' architectural differences impact the effectiveness of the approach. In terms of resources, the architecture is 75% smaller than an equivalent floating point implementation, and twice as big as an equivalent fixed point implementation. A similar relationship was achieved in terms of static power consumption. Although the architecture presents a larger precision than floating point; it can only approach the numerical range of floating point at the cost of a penalty in performance due to the reconfiguration time overhead. Using a modified reconfiguration technique, reconfiguration speed is 60 times faster than the current standard. This allows us to reconfigure the precision up to 250 times per second, both precision and operation up to 150 times per second, at a frequency of 100MHz. The architecture outperforms an equivalent floating point unit when reconfiguration is required at a rate of once every 10000 operations or less.