Heterogeneous Computing for Edge AI

Current artificial intelligence (AI) with human-level accuracy has promising business values in many application fields. However, the price is high computation complexity and memory bandwidth. Therefore it is challenging to deploy AI onto edge devices where the power and hardware resource are limited. In this paper, the design challenges on Edge AI and solutions from Mediatek is introduced. Dedicated parallel AI processor is embedded to gain computation and power efficiency. Memory hierarchy is designed to share and reuse data locally without redundant DRAM accessing. A direct data link is implemented to pass the data between system peripherals and processors to further reduce DRAM bandwidth. The implementation result shows the SoC with all these techniques significantly outperforms other works according to ETHZ AI benchmark.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Toshihiro Hattori,et al.  4.4 A 197mW 70ms-latency full-HD 12-channel video-processing SoC for car information systems , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Yoshua Bengio,et al.  Training deep neural networks with low precision multiplications , 2014 .

[5]  Pei-Kuei Tsung,et al.  NeuroPilot: A Cross-Platform Framework for Edge-AI , 2019, 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS).

[6]  Pei-Kuei Tsung,et al.  Techology trend of edge AI , 2018, 2018 International Symposium on VLSI Design, Automation and Test (VLSI-DAT).

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.