A Large-Scale Architecture for Restricted Boltzmann Machines

Deep Belief Nets (DBNs) are an emerging application in the machine learning domain, which use Restricted Boltzmann Machines (RBMs) as their basic building block. Although small scale DBNs have shown great potential, the computational cost of RBM training has been a major challenge in scaling to large networks. In this paper we present a highly scalable architecture for Deep Belief Net processing on hardware systems that can handle hundreds of boards, if not more, of customized logic with near linear performance increase. We elucidate tradeoffs between flexibility in the neuron connections, and the hardware resources, such as memory and communication bandwidth, required to build a custom processor design that has optimal efficiency. We illustrate how our architecture can easily support sparse networks with dense regions of connections between neighboring sets of neurons, which is relevant to applications where there are obvious spatial correlations in the data, such as in image processing. We demonstrate the feasibility of our approach by implementing a multi-FPGA system. We show that a speedup of 46X-112X over an optimized single core CPU implementation can be achieved for a four-FPGA implementation.