Protecting Machine Learning Integrity in Distributed Big Data Networking

A distributed big data network is the integration of big data and the underlying distributed network. This emerging paradigm brings the potential to divide big data processing tasks into smaller ones so that they can be intelligently processed in parallel with machine learning based on distributed network resources. Such a pattern requires strict system integrity, especially machine learning integrity against data tampering or network control by malicious nodes. In this article, we propose a secure architecture consisting of one HaSi scheme and two data tampering detection schemes for protecting the machine learning integrity in distributed big data networking. Illustrative results demonstrate the effect of our proposed schemes, and show that they can ensure the learning accuracy even when 30-40 percent of processing nodes are maliciously controlled. When the figure raises to 40-50 percent, the accuracy of our proposed schemes begins to fall visibly, but still outperforms the scenario without protection by up to 70-80 percent.

[1]  Shahid Mumtaz,et al.  BEGIN: Big Data Enabled Energy-Efficient Vehicular Edge Computing , 2018, IEEE Communications Magazine.

[2]  Christian Esposito,et al.  Securing Collaborative Deep Learning in Industrial Applications Within Adversarial Scenarios , 2018, IEEE Transactions on Industrial Informatics.

[3]  Ruidong Li,et al.  A Blockchain-Based Data Life Cycle Protection Framework for Information-Centric Networks , 2019, IEEE Communications Magazine.

[4]  Moses Garuba,et al.  Cybersecurity in Big Data Era: From Securing Big Data to Data-Driven Security , 2019, IEEE Transactions on Services Computing.

[5]  Xukan Ran,et al.  Deep Learning With Edge Computing: A Review , 2019, Proceedings of the IEEE.

[6]  Fabio Roli,et al.  Randomized Prediction Games for Adversarial Machine Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Özgür B. Akan,et al.  Internet of Things and Sensor Networks , 2019, IEEE Commun. Mag..

[8]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[9]  Wenchao Xu,et al.  Big Data Driven Vehicular Networks , 2018, IEEE Network.

[10]  Mohammad S. Obaidat,et al.  Edge Computing-Based Security Framework for Big Data Analytics in VANETs , 2019, IEEE Network.

[11]  Wei Cai,et al.  A Survey on Security Threats and Defensive Techniques of Machine Learning: A Data Driven View , 2018, IEEE Access.

[12]  Fei Wang,et al.  Sparse Feature Attacks in Adversarial Learning , 2014, IEEE Transactions on Knowledge and Data Engineering.

[13]  Kin K. Leung,et al.  When Edge Meets Learning: Adaptive Control for Resource-Constrained Distributed Machine Learning , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[14]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[15]  Hossam S. Hassanein,et al.  Big Sensed Data: Evolution, Challenges, and a Progressive Framework , 2018, IEEE Communications Magazine.