SAZD: A Low Computational Load Coded Distributed Computing Framework for IoT Systems

Coded distributed computing (CDC) can overcome the problem that the computation of matrix multiplication with an extremely huge dimension cannot be executed in a single Internet-of-Things (IoT) node. All the encoding of existing CDC schemes are based on the linear combination (LC) to generate independent computation tasks, which introduces a heavy computational load, including a significant volume of expensive multiplications (compared with inexpensive additions) and even more expensive divisions to the encoding and decoding phases. Note that the number of elementwise multiplications of the LC operation during the encoding phase is <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> times that of the original computation task, where <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> denotes the number of worker nodes. In this article, to avoid expensive multiplications introduced by LC, a fresh new CDC framework based on shift-and-addition (SA) over the real field is proposed. In addition, to avoid the expensive matrix inverse operation (divisions) in the decoding phase, zigzag decoding (ZD) is incorporated. The proposed scheme, which combines SA and ZD and is hence named SAZD-based CDC, avoids expensive multiplications and divisions in both the encoding and decoding phases. It targets the following simultaneous objectives: an arbitrary <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula> out of <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> generated computation tasks is independent and can recover the original computation tasks with the ZD algorithm, and the shift distance is small so as to cause a light additional computational load in the computation phase. Both analysis and practical study show that compared to the LC-based CDC, the SAZD-based CDC significantly reduces the computational load.

[1]  Anindya Bijoy Das,et al.  C3LES: Codes for Coded Computation that Leverage Stragglers , 2018, 2018 IEEE Information Theory Workshop (ITW).

[2]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[3]  Rajkumar Buyya,et al.  Next generation cloud computing: New trends and research directions , 2017, Future Gener. Comput. Syst..

[4]  Yuguang Fang,et al.  Efficient Privacy-Preserving Machine Learning in Hierarchical Distributed System , 2019, IEEE Transactions on Network Science and Engineering.

[5]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[6]  Mohammad Ali Maddah-Ali,et al.  Coding for Distributed Fog Computing , 2017, IEEE Communications Magazine.

[7]  Khaled A. Harras,et al.  Femto Clouds: Leveraging Mobile Devices to Provide Cloud Service at the Edge , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[8]  Roberto Tempo,et al.  The PageRank Problem, Multiagent Consensus, and Web Aggregation: A Systems and Control Viewpoint , 2013, IEEE Control Systems.

[9]  Albin Severinson,et al.  Block-Diagonal and LT Codes for Distributed Computing With Straggling Servers , 2017, IEEE Transactions on Communications.

[10]  Dina Katabi,et al.  Zigzag decoding: combating hidden terminals in wireless networks , 2008, SIGCOMM '08.

[11]  Mianxiong Dong,et al.  Guest editorial: fog computing on wheels , 2018, Peer Peer Netw. Appl..

[12]  Pulkit Grover,et al.  “Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products , 2017, IEEE Transactions on Information Theory.

[13]  Mohammad Ali Maddah-Ali,et al.  Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication , 2017, NIPS.

[14]  Chi Wan Sung,et al.  A ZigZag-decodable code with the MDS property for distributed storage systems , 2013, 2013 IEEE International Symposium on Information Theory.

[15]  Md Zakirul Alam Bhuiyan,et al.  Fog-Based Computing and Storage Offloading for Data Synchronization in IoT , 2019, IEEE Internet of Things Journal.

[16]  Albert Y. Zomaya,et al.  Computation Offloading for Service Workflow in Mobile Cloud Computing , 2015, IEEE Transactions on Parallel and Distributed Systems.

[17]  Amir Salman Avestimehr,et al.  Coded computation over heterogeneous clusters , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[18]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[19]  A. Salman Avestimehr,et al.  A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[20]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[21]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[22]  Albin Severinson,et al.  Block-diagonal coding for distributed computing with straggling servers , 2017, 2017 IEEE Information Theory Workshop (ITW).

[23]  Sergio Barbarossa,et al.  Joint Optimization of Radio and Computational Resources for Multicell Mobile-Edge Computing , 2014, IEEE Transactions on Signal and Information Processing over Networks.

[24]  Chi Wan Sung,et al.  A New Zigzag-Decodable Code with Efficient Repair in Wireless Distributed Storage , 2017, IEEE Transactions on Mobile Computing.

[25]  Yan Zhang,et al.  Mobile Edge Computing: A Survey , 2018, IEEE Internet of Things Journal.

[26]  Y. Robert,et al.  Fault-Tolerance Techniques for High-Performance Computing , 2015, Computer Communications and Networks.

[27]  Yuxuan Xing,et al.  Dynamic Heterogeneity-Aware Coded Cooperative Computation at the Edge , 2018, 2018 IEEE 26th International Conference on Network Protocols (ICNP).

[28]  Stephen B. Wicker,et al.  Reed-Solomon Codes and Their Applications , 1999 .

[29]  Mo Li,et al.  Exploring Deep Learning for Efficient and Reliable Mobile Sensing , 2018, IEEE Netw..

[30]  Victor V. Toporkov,et al.  Scalable co-scheduling strategies in distributed computing , 2010, ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010.

[31]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[32]  Gregory W. Wornell,et al.  Using Straggler Replication to Reduce Latency in Large-scale Parallel Computing , 2015, PERV.