Exploiting Computation Reuse in Cloud-Based Deep Learning via Input Reordering

Recently, deep learning (DL) becomes increasingly important since its transformative effect on a wide range of applications. During inference process, the DL model is deployed on the cloud to answer online queries. One crucial issue in the progress of DL inference is energy consumption, which significantly retards computation performance. Therefore, many previous investigations decrease the energy consumption via computation reuse technique based on similarity. However, if input data consists individually from mobile devices, applying these schemes will significantly decline computation performance. Because in disordered individual inputs, similarity for reuse is difficult to exploit directly. Results of initial experimental observations show that (1) individual input data also has high similarity for reuse, and (2) the total similarity during computation process has a relation with the characteristics of input data. This motivates us to design a reordering scheme to enhance similarity for computation reuse. Our main approaches are using statistical theory to predict the similarities among input data, and determining the execution sequence. Based on these approaches, we propose an effective input reordering scheme for computation reuse to save energy consumption. The evaluation under various benchmarks demonstrates that the reordering scheme significantly outperforms the previous schemes, for instance, the computation reuse is enhanced to $1.1 \times$ and the energy consumption is minimized to 40% according to the configuration of traditional computation reuse technique.

[1]  Minyi Guo,et al.  Renewable Energy-Aware Big Data Analytics in Geo-Distributed Data Centers with Reinforcement Learning , 2020, IEEE Transactions on Network Science and Engineering.

[2]  Song Guo,et al.  Green Resource Allocation Based on Deep Reinforcement Learning in Content-Centric IoT , 2018, IEEE Transactions on Emerging Topics in Computing.

[3]  Yan Zhang,et al.  Efficient Gradient Descent via Value Staleness Analysis for Heterogeneous Deep Learning Systems , 2019, 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD).

[4]  Eunhyeok Park,et al.  Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[5]  Minyi Guo,et al.  Falcon: Addressing Stragglers in Heterogeneous Parameter Server Via Multiple Parallelism , 2021, IEEE Transactions on Computers.

[6]  Minyi Guo,et al.  Fast Coflow Scheduling via Traffic Compression and Stage Pipelining in Datacenter Networks , 2019, IEEE Transactions on Computers.

[7]  Jose-Maria Arnau,et al.  Computation Reuse in DNNs by Exploiting Input Similarity , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[8]  Wenyao Xu,et al.  QoE-Driven Content-Centric Caching With Deep Reinforcement Learning in Edge-Enabled IoT , 2019, IEEE Computational Intelligence Magazine.

[9]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[10]  Toshiaki Miyazaki,et al.  QoE-Based Big Data Analysis with Deep Learning in Pervasive Edge Environment , 2018, 2018 IEEE International Conference on Communications (ICC).

[11]  Minyi Guo,et al.  Reinforcement learning-based adaptive resource management of differentiated services in geo-distributed data centers , 2017, 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS).

[12]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[13]  Marianne Winslett,et al.  HaaS: Cloud-Based Real-Time Data Analytics with Heterogeneity-Aware Scheduling , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[14]  Albert Y. Zomaya,et al.  Redundancy Avoidance for Big Data in Data Centers: A Conventional Neural Network Approach , 2020, IEEE Transactions on Network Science and Engineering.

[15]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[16]  Dongyoung Kim,et al.  ZeNA: Zero-Aware Neural Network Accelerator , 2018, IEEE Design & Test.

[17]  Song Guo,et al.  Traffic and Computation Co-Offloading With Reinforcement Learning in Fog Computing for Industrial Applications , 2019, IEEE Transactions on Industrial Informatics.

[18]  Song Guo,et al.  Cluster Frameworks for Efficient Scheduling and Resource Allocation in Data Center Networks: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[19]  Mengjia Yan,et al.  UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[20]  Minyi Guo,et al.  Making Big Data Open in Edges: A Resource-Efficient Blockchain-Based Approach , 2019, IEEE Transactions on Parallel and Distributed Systems.

[21]  Minyi Guo,et al.  Falcon: Towards Computation-Parallel Deep Learning in Heterogeneous Parameter Server , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).