Towards Federated Learning Approach to Determine Data Relevance in Big Data

In the past few years, data has proliferated to astronomical proportions; as a result, big data has become the driving force behind the growth of many machine learning innovations. However, the incessant generation of data in the information age poses a needle in the haystack problem, where it has become challenging to determine useful data from a heap of irrelevant ones. This has resulted in a quality over quantity issue in data science where a lot of data is being generated, but the majority of it is irrelevant. Furthermore, most of the data and the resources needed to effectively train machine learning models are owned by major tech companies, resulting in a centralization problem. As such, federated learning seeks to transform how machine learning models are trained by adopting a distributed machine learning approach. Another promising technology is the blockchain, whose immutable nature ensures data integrity. By combining the blockchain's trust mechanism and federated learning's ability to disrupt data centralization, we propose an approach that determines relevant data and stores the data in a decentralized manner.

[1]  David Mazières The Stellar Consensus Protocol : A Federated Model for Internet-level Consensus , 2015 .

[2]  Elaine Shi,et al.  FruitChains: A Fair Blockchain , 2017, IACR Cryptol. ePrint Arch..

[3]  Juan Benet,et al.  IPFS - Content Addressed, Versioned, P2P File System , 2014, ArXiv.

[4]  Zhu Han,et al.  When Mobile Blockchain Meets Edge Computing , 2017, IEEE Communications Magazine.

[5]  Moni Naor,et al.  Pricing via Processing or Combatting Junk Mail , 1992, CRYPTO.

[6]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[7]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[8]  Danda B. Rawat,et al.  Leveraging Distributed Blockchain-based Scheme for Wireless Network Virtualization with Security and QoS Constraints , 2018, 2018 International Conference on Computing, Networking and Communications (ICNC).

[9]  Mehdi Bennis,et al.  On-Device Federated Learning via Blockchain and its Latency Analysis , 2018, ArXiv.

[10]  Stefan Dziembowski,et al.  Proofs of Space , 2015, CRYPTO.

[11]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[12]  Brian F. Cooper Spanner: Google's globally-distributed database , 2013, SYSTOR '13.

[13]  Moses Garuba,et al.  Next-generation cybersecurity through a blockchain-enabled federated cloud framework , 2018, The Journal of Supercomputing.

[14]  Prateek Saxena,et al.  A Secure Sharding Protocol For Open Blockchains , 2016, CCS.

[15]  Alex Pentland,et al.  Enigma: Decentralized Computation Platform with Guaranteed Privacy , 2015, ArXiv.

[16]  Richard Craib,et al.  Numeraire : A Cryptographic Token for Coordinating Machine Intelligence and Preventing Overfitting , 2017 .

[17]  Björn Scheuermann,et al.  Bitcoin and Beyond: A Technical Survey on Decentralized Digital Currencies , 2016, IEEE Communications Surveys & Tutorials.

[18]  Zhenguo Li,et al.  Federated Meta-Learning for Recommendation , 2018, ArXiv.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Aviv Zohar,et al.  Accelerating Bitcoin's Transaction Processing. Fast Money Grows on Trees, Not Chains , 2013, IACR Cryptol. ePrint Arch..

[21]  Danda B. Rawat,et al.  Blockchain: Emerging Applications and Use Cases , 2019, ArXiv.

[22]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[23]  A. Besir Kurtulmus,et al.  Trustless Machine Learning Contracts; Evaluating and Exchanging Machine Learning Models on the Ethereum Blockchain , 2018, ArXiv.