Cocktail: Cost-efficient and Data Skew-aware Online In-Network Distributed Machine Learning for Intelligent 5G and Beyond

To facilitate the emerging applications in the 5G networks and beyond, mobile network operators will provide many powerful control functionalities such as RAN slicing and resource scheduling. These control functionalities generally comprise a series of prediction tasks such as channel state information prediction, cellular traffic prediction and user mobility prediction which will be enabled by machine learning (ML) techniques. However, training the ML models offline is inefficient, due to the excessive overhead for forwarding the huge volume of data samples from cellular networks to remote ML training clouds. Thanks to the promising edge computing paradigm, we advocate cooperative online in-network ML training across edge clouds. To alleviate the data skew issue caused by the capacity heterogeneity and dynamics of edge clouds while avoiding excessive overhead, we propose Cocktail, a cost-efficient and data skew-aware online in-network distributed machine learning framework. We build a comprehensive model and formulate an online data scheduling problem to optimize the framework cost while reconciling the data skew from both short-term and long-term perspective. We exploit the stochastic gradient descent to devise an online asymptotically optimal algorithm. As its core building block, we propose optimal policies based on novel graph constructions to respectively solve two subproblems. We also improve the proposed online algorithm with online learning for fast convergence of in-network ML training. A small-scale testbed and large-scale simulations validate the superior performance of our framework.

[1]  Onur Mutlu,et al.  Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[2]  Jingdong Xu,et al.  Online Resource Allocation, Content Placement and Request Routing for Cost-Efficient Edge Caching in Cloud Radio Access Networks , 2018, IEEE Journal on Selected Areas in Communications.

[3]  Ying-Chang Liang,et al.  Federated Learning in Mobile Edge Networks: A Comprehensive Survey , 2020, IEEE Communications Surveys & Tutorials.

[4]  Xu Chen,et al.  Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing , 2019, Proceedings of the IEEE.

[5]  Hasan Farooq,et al.  What Machine Learning Predictor Performs Best for Mobility Prediction in Cellular Networks? , 2019, 2019 IEEE International Conference on Communications Workshops (ICC Workshops).

[6]  Xiaofei Wang,et al.  Convergence of Edge Computing and Deep Learning: A Comprehensive Survey , 2019, IEEE Communications Surveys & Tutorials.

[7]  Hamed Haddadi,et al.  Deep Learning in Mobile and Wireless Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[8]  Pan Li,et al.  Channel State Information Prediction for 5G Wireless Communications: A Deep Learning Approach , 2020, IEEE Transactions on Network Science and Engineering.

[9]  Erik G. Larsson,et al.  Artificial Intelligence Enabled Wireless Networking for 5G and Beyond: Recent Advances and Future Challenges , 2020, IEEE Wireless Communications.

[10]  Jeffrey G. Andrews,et al.  What Will 5G Be? , 2014, IEEE Journal on Selected Areas in Communications.

[11]  K. B. Letaief,et al.  A Survey on Mobile Edge Computing: The Communication Perspective , 2017, IEEE Communications Surveys & Tutorials.

[12]  Longbo Huang,et al.  The power of online learning in stochastic network optimization , 2014, SIGMETRICS '14.

[13]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[14]  Marco Fiore,et al.  DeepCog: Cognitive Network Management in Sliced 5G Networks with Deep Learning , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[15]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[16]  Qing Ling,et al.  Learn-and-Adapt Stochastic Dual Gradients for Network Resource Allocation , 2017, IEEE Transactions on Control of Network Systems.

[17]  Kin K. Leung,et al.  When Edge Meets Learning: Adaptive Control for Resource-Constrained Distributed Machine Learning , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[18]  Mohsen Guizani,et al.  Reliable Federated Learning for Mobile Networks , 2019, IEEE Wireless Communications.

[19]  Deniz Gündüz,et al.  Hierarchical Federated Learning ACROSS Heterogeneous Cellular Networks , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Wei Ni,et al.  Stochastic Online Learning for Mobile Edge Computing: Learning from Changes , 2019, IEEE Communications Magazine.

[21]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[22]  Sung Hoon Lim,et al.  Online Learning for Joint Beam Tracking and Pattern Optimization in Massive MIMO Systems , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[23]  Takayuki Nishio,et al.  Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge , 2018, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[24]  Albert Y. Zomaya,et al.  Federated Learning over Wireless Networks: Optimization Model Design and Analysis , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[25]  Huaiyu Dai,et al.  A Survey on Low Latency Towards 5G: RAN, Core Network and Caching Solutions , 2017, IEEE Communications Surveys & Tutorials.