Learning Without Peeking: Secure Multi-party Computation Genetic Programming

Genetic Programming is widely used to build predictive models for defect proneness or development efforts. The predictive modelling often depends on the use of sensitive data, related to past faults or internal resources, as training data. We envision a scenario in which revealing the training data constitutes a violation of privacy. To ensure organisational privacy in such a scenario, we propose SMCGP, a method that performs Genetic Programming as Secure Multiparty Computation. In SMCGP, one party uses GP to learn a model of training data provided by another party, without actually knowing each datapoint in the training data. We present an SMCGP approach based on the garbled circuit protocol, which is evaluated using two problem sets: a widely studied symbolic regression benchmark, and a GP-based fault localisation technique with real world fault data from Defects4J benchmark. The results suggest that SMCGP can be equally accurate as the normal GP, but the cost of keeping the training data hidden can be about three orders of magnitude slower execution.

[1]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[2]  Maria-Florina Balcan,et al.  Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[3]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[4]  Mariana Raykova,et al.  Privacy-Preserving Distributed Linear Regression on High-Dimensional Data , 2017, Proc. Priv. Enhancing Technol..

[5]  Wenliang Du,et al.  Secure multi-party computation problems and their applications: a review and open problems , 2001, NSPW '01.

[6]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[7]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[8]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[9]  Dick den Hertog,et al.  Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming , 2009, IEEE Transactions on Evolutionary Computation.

[10]  Xiao-Yuan Jing,et al.  On the Multiple Sources and Privacy Preservation Issues for Heterogeneous Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[11]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[12]  Michael Walfish,et al.  Pretzel: Email encryption and provider-supplied functions are compatible , 2017, SIGCOMM.

[13]  Christos Gkantsidis,et al.  VC3: Trustworthy Data Analytics in the Cloud Using SGX , 2015, 2015 IEEE Symposium on Security and Privacy.

[14]  Shin Yoo,et al.  Empirical evaluation of conditional operators in GP based fault localization , 2017, GECCO.

[15]  Claire Le Goues,et al.  A genetic programming approach to automated software repair , 2009, GECCO.

[16]  Maarten Keijzer,et al.  Improving Symbolic Regression with Interval Arithmetic and Linear Scaling , 2003, EuroGP.

[17]  Shin Yoo,et al.  FLUCCS: using code and change metrics to improve fault localization , 2017, ISSTA.

[18]  Name M. Lastname Automatically Finding Patches Using Genetic Programming , 2013 .

[19]  Máire O'Neill,et al.  Practical homomorphic encryption: A survey , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[20]  Taghi M. Khoshgoftaar,et al.  Genetic programming model for software quality classification , 2001, Proceedings Sixth IEEE International Symposium on High Assurance Systems Engineering. Special Topic: Impact of Networking.

[21]  Tim Menzies,et al.  Balancing Privacy and Utility in Cross-Company Defect Prediction , 2013, IEEE Transactions on Software Engineering.

[22]  Carlos V. Rozas,et al.  Innovative instructions and software model for isolated execution , 2013, HASP '13.

[23]  Wojciech Jaskowski,et al.  Better GP benchmarks: community survey results and proposals , 2012, Genetic Programming and Evolvable Machines.

[24]  Tihana Galinac Grbac,et al.  Co-evolutionary multi-population genetic programming for classification in software defect prediction: An empirical case study , 2017, Appl. Soft Comput..

[25]  Galen C. Hunt,et al.  Shielding Applications from an Untrusted Cloud with Haven , 2014, OSDI.

[26]  David Evans,et al.  Obliv-C: A Language for Extensible Data-Oblivious Computation , 2015, IACR Cryptol. ePrint Arch..

[27]  Ittai Anati,et al.  Innovative Technology for CPU Based Attestation and Sealing , 2013 .

[28]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[29]  Oded Goldreich,et al.  A randomized protocol for signing contracts , 1985, CACM.

[30]  Filomena Ferrucci,et al.  Genetic Programming for Effort Estimation: An Analysis of the Impact of Different Fitness Functions , 2010, 2nd International Symposium on Search Based Software Engineering.

[31]  Ahmad-Reza Sadeghi,et al.  TinyGarble: Highly Compressed and Scalable Sequential Garbled Circuits , 2015, 2015 IEEE Symposium on Security and Privacy.

[32]  Michael O'Neill,et al.  Genetic Programming and Evolvable Machines Manuscript No. Semantically-based Crossover in Genetic Programming: Application to Real-valued Symbolic Regression , 2022 .

[33]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[34]  Shin Yoo,et al.  Evolving Human Competitive Spectra-Based Fault Localisation Techniques , 2012, SSBSE.

[35]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[36]  Quanquan Gu,et al.  Aggregating Private Sparse Learning Models Using Multi-Party Computation , 2016 .

[37]  Shin Yoo,et al.  GPGPGPU: Evaluation of Parallelisation of Genetic Programming Using GPGPU , 2017, SSBSE.