Special-purpose parallel architectures for high-performance machine learning
暂无分享,去创建一个
The research presented here aims at developing special-purpose VLSI modules to be used as components of fully autonomous massively-parallel systems for real-time "adaptive" applications based on Machine Learning techniques. In particular, one can realize Neural Nets that tackle the implementation of biological models in low-cost digital VLSI, maximizing the concurrency of operations with low-precision weights and low-accuracy signal processing. The training problem is solved by the l~TS algorithm [1], which offers the same recognition performance obtained on high-performance scientific workstations and a speed competitive with that of state-of-the-art st~percomputers at a much lower cost. The digital data stream SIMD computational structure was used as the paradigm for the development of bit-parallel and bit-serial architectures. The T O T E M chip was developed to implement bit-parallel architectures. It comprises an array of 32 parallel processors with closely-coupled on-chip weight memory and control logic, broadcast and output buses. The chip was fabricated in a 1.2 #m CMOS process with about 250,000 transistors on an area of 70 mm 2. Measured cycle time is under 30 as, for a sustained performance of 1.0 Giga multiply-accumulate operations/s and 4-cycle pipeline latency The high composability of the processor makes it an ideal building-block for systems with massive parallelism. The on-chip storage of the parameters limits the input /output requirements and permits direct domain-decomposition and pipeline schemes at the system level.
[1] R. Battiti,et al. TOTEM: a digital processor for neural networks and Reactive Tabu Search , 1994, Proceedings of the Fourth International Conference on Microelectronics for Neural Networks and Fuzzy Systems.
[2] Roberto Battiti,et al. Training neural nets with the reactive tabu search , 1995, IEEE Trans. Neural Networks.