Partition and Scheduling Algorithms for Neural Network Accelerators

In recent years, Artificial Neural Networks have evolved rapidly and are applied to various fields. Meanwhile, to enhance computation efficiency of neural network applications, more and more neural network accelerators have been developed. Though traditional task scheduling algorithms on heterogeneous systems have been intensively researched, they can’t be applied to neural network accelerators directly. Based on typical characteristics of neural network accelerators, we formalize the problem of tasks scheduling for neural networks, and transplant two listing heuristic scheduling algorithms, Heterogeneous-Earliest-Finish-Time (HEFT) and Critical-Path-on-a-Processor (CPOP). Inspired by the separable features of neural network operations, we propose two partition algorithms, the Iterative Partition Scheduling Algorithm (IPS) and the Partition Scheduling Combination Algorithm (PSC), which can be associated with scheduling algorithms. Further, we conduct experiments on some typical neural networks, and results show that compared to scheduling-only algorithms the partition associated algorithms achieve about 2x to 3x speedup.

[1]  Thomas P. Parnell,et al.  Temporal correlation detection using computational phase-change memory , 2017, Nature Communications.

[2]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[3]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[4]  Hesham El-Rewini,et al.  Scheduling Parallel Program Tasks onto Arbitrary Target Machines , 1990, J. Parallel Distributed Comput..

[5]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  Sparsh Mittal,et al.  A survey of FPGA-based accelerators for convolutional neural networks , 2018, Neural Computing and Applications.

[10]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[11]  Karin Strauss,et al.  Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .

[12]  Alessandro Aimar,et al.  NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[14]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[15]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[16]  C. David Wright,et al.  In-memory computing on a photonic platform , 2018, Science Advances.

[17]  Sparsh Mittal,et al.  A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform , 2019, J. Syst. Archit..

[18]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[19]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.