Long-term Data Sharing under Exclusivity Attacks

The quality of learning generally improves with the scale and diversity of data. Companies and institutions can therefore benefit from building models over shared data. Many cloud and blockchain platforms, as well as government initiatives, are interested in providing this type of service. These cooperative efforts face a challenge, which we call "exclusivity attacks". A firm can share distorted data, so that it learns the best model fit, but is also able to mislead others. We study protocols for long-term interactions and their vulnerability to these attacks, in particular for regression and clustering tasks. We find that the choice of communication protocol is essential for vulnerability: The protocol is much more vulnerable if firms can continuously initiate communication, instead of periodically asked for their inputs. Vulnerability may also depend on the number of Sybil identities a firm can control.

[1]  Yuan Lu,et al.  On Enabling Machine Learning Tasks atop Public Blockchains: A Crowdsourcing Approach , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[2]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[3]  Manlu Liu,et al.  How Will Blockchain Technology Impact Auditing and Accounting: Permissionless versus Permissioned Blockchain , 2019, Current Issues in Auditing.

[4]  Bo Waggoner,et al.  Decentralized and Collaborative AI on Blockchain , 2019, 2019 IEEE International Conference on Blockchain (Blockchain).

[5]  Jiong Jin,et al.  Towards Fair and Privacy-Preserving Federated Deep Models , 2019, IEEE Transactions on Parallel and Distributed Systems.

[6]  Benoit Radier,et al.  The Road to European Digital Sovereignty with Gaia-X and IDSA , 2021, IEEE Netw..

[7]  Moshe Tennenholtz,et al.  Approximate mechanism design without money , 2009, EC '09.

[8]  M. Satterthwaite,et al.  Strategy-proofness and single-peakedness , 1976 .

[9]  Stratis Ioannidis,et al.  Linear Regression from Strategic Data Sources , 2013, ACM Trans. Economics and Comput..

[10]  Yiling Chen,et al.  Learning Strategy-Aware Linear Classifiers , 2019, NeurIPS.

[11]  Minming Li,et al.  Mechanism Design for Facility Location Problems: A Survey , 2021, IJCAI.

[12]  Ivan Damgård,et al.  Secure Multiparty Computation and Secret Sharing , 2015 .

[13]  Piotr Faliszewski,et al.  Properties of multiwinner voting rules , 2014, Social Choice and Welfare.

[14]  Peter R. Slowinski,et al.  The Data Sharing Economy: On the Emergence of New Intermediaries , 2018, IIC - International Review of Intellectual Property and Competition Law.

[15]  Ramesh Raskar,et al.  Distributed learning of deep neural network over multiple agents , 2018, J. Netw. Comput. Appl..

[16]  S. L. Hakimi,et al.  Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph , 1964 .

[17]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[18]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[19]  Moshe Tennenholtz,et al.  Non-cooperative computation: Boolean functions with correctness and exclusivity , 2005, Theor. Comput. Sci..

[20]  Randy Holden,et al.  Data-driven innovation : big data for growth and well-being , 2015 .

[21]  Yehuda Afek,et al.  Cheating by Duplication: Equilibrium Requires Global Knowledge , 2017, ArXiv.

[22]  Michael B. Miller Linear Regression Analysis , 2013 .

[23]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[24]  Barnabás Póczos,et al.  Deep Learning with Sets and Point Clouds , 2016, ICLR.

[25]  Murat Kantarcioglu,et al.  Incentive Compatible Privacy-Preserving Distributed Classification , 2012, IEEE Transactions on Dependable and Secure Computing.

[26]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[27]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[28]  Murat Kantarcioglu,et al.  Incentive Compatible Privacy-Preserving Data Analysis , 2013, IEEE Transactions on Knowledge and Data Engineering.

[29]  Yang Cai,et al.  Optimum Statistical Estimation with Strategic Data Sources , 2014, COLT.

[30]  Ariel D. Procaccia,et al.  Incentive compatible regression learning , 2008, SODA '08.

[31]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[32]  Moshe Tennenholtz,et al.  VCG Under Sybil (False-name) Attacks - a Bayesian Analysis , 2020, AAAI.

[33]  Michael Vitale,et al.  The Wisdom of Crowds , 2015, Cell.

[34]  Makoto Yokoo,et al.  The effect of false-name bids in combinatorial auctions: new fraud in internet auctions , 2004, Games Econ. Behav..

[35]  Robert H. Deng,et al.  CrowdBC: A Blockchain-Based Decentralized Framework for Crowdsourcing , 2019, IEEE Transactions on Parallel and Distributed Systems.

[36]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[37]  Moshe Tennenholtz,et al.  Regression Equilibrium , 2019, EC.