Capella: Customizing Perception for Edge Devices by Efficiently Allocating FPGAs to DNNs

Deep neural networks (DNNs) have seen resurgent attraction to be implemented in edge applications. However, such implementations are not easy to achieve because execution of DNNs often require more resources than those provided by individual edge devices. On the other hand, relying on model-level distribution methods to implement a DNN on connected edge devices leads to costly communication overheads. To utilize available in-the-edge resources with less communication overhead, we propose using edge-tailored models comprised of nearly-independent narrow DNNs, the inference of which are accelerated using small cost-efficient RISC-based engines. We implement these engines on PYNQ boards as a platform that mimics the limited resources of edge devices. We create the narrow DNNs based on the available resources of PYNQ boards, and allocate each narrow DNN to one engine, implemented in an FPGA. We compare the communication overhead of our implantation against the state-of-the-art model-level distribution methods.