Protecting Big Data Privacy Using Randomized Tensor Network Decomposition and Dispersed Tensor Computation

Data privacy is an important issue for organizations and enterprises to securely outsource data storage, sharing, and computation on clouds / fogs. However, data encryption is complicated in terms of the key management and distribution; existing secure computation techniques are expensive in terms of computational / communication cost and therefore do not scale to big data computation. Tensor network decomposition and distributed tensor computation have been widely used in signal processing and machine learning for dimensionality reduction and large-scale optimization. However, the potential of distributed tensor networks for big data privacy preservation have not been considered before, this motivates the current study. Our primary intuition is that tensor network representations are mathematically non-unique, unlinkable, and uninterpretable; tensor network representations naturally support a range of multilinear operations for compressed and distributed / dispersed computation. Therefore, we propose randomized algorithms to decompose big data into randomized tensor network representations and analyze the privacy leakage for 1D to 3D data tensors. The randomness mainly comes from the complex structural information commonly found in big data; randomization is based on controlled perturbation applied to the tensor blocks prior to decomposition. The distributed tensor representations are dispersed on multiple clouds / fogs or servers / devices with metadata privacy, this provides both distributed trust and management to seamlessly secure big data storage, communication, sharing, and computation. Experiments show that the proposed randomization techniques are helpful for big data anonymization and efficient for big data storage and computation.

[1]  Roksana Boreli,et al.  Applying Differential Privacy to Matrix Factorization , 2015, RecSys.

[2]  Daniel Kressner,et al.  A literature survey of low‐rank tensor approximation techniques , 2013, 1302.7121.

[3]  Benjamin Fabian,et al.  Collaborative and secure sharing of healthcare data in multi-clouds , 2015, Inf. Syst..

[4]  Han Qiu,et al.  Data protection: Combining fragmentation, encryption, and dispersion , 2015, 2015 International Conference on Cyber Security of Smart Cities, Industrial Control System and Communications (SSIC).

[5]  Gérard Memmi,et al.  Data protection by means of fragmentation in various different distributed storage systems - a survey , 2017, ArXiv.

[6]  Lars Grasedyck,et al.  Hierarchical Singular Value Decomposition of Tensors , 2010, SIAM J. Matrix Anal. Appl..

[7]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[8]  Ronald L. Rivest,et al.  All-or-Nothing Encryption and the Package Transform , 1997, FSE.

[9]  Amir Vajdi,et al.  Human Gait Database for Normal Walk Collected by Smart Phone Accelerometer , 2019, ArXiv.

[10]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[11]  Andrzej Cichocki,et al.  Fundamental tensor operations for large-scale data analysis using tensor network formats , 2017, Multidimensional Systems and Signal Processing.

[12]  Stratis Ioannidis,et al.  Privacy-preserving matrix factorization , 2013, CCS.

[13]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[14]  Xiaoli Ma,et al.  First-Order Perturbation Analysis of Singular Vectors in Singular Value Decomposition , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[15]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[16]  Andrzej Cichocki,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions , 2016, Found. Trends Mach. Learn..

[17]  E. Tyrtyshnikov,et al.  TT-cross approximation for multidimensional arrays , 2010 .

[18]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[19]  W. Hackbusch Tensor Spaces and Numerical Tensor Calculus , 2012, Springer Series in Computational Mathematics.

[20]  Lieven De Lathauwer,et al.  Structured Data Fusion , 2015, IEEE Journal of Selected Topics in Signal Processing.

[21]  Vladimir Kolesnikov,et al.  A Pragmatic Introduction to Secure Multi-Party Computation , 2019, Found. Trends Priv. Secur..

[22]  Volkan Cevher,et al.  Technical Report No . 201 701 January 201 7 RANDOMIZED SINGLE-VIEW ALGORITHMS FOR LOW-RANK MATRIX APPROXIMATION , 2016 .

[23]  Hugo Krawczyk,et al.  Secret Sharing Made Short , 1994, CRYPTO.

[24]  Josep Domingo-Ferrer,et al.  Privacy-preserving cloud computing on sensitive data: A survey of methods, products and challenges , 2019, Comput. Commun..

[25]  Jing Ma,et al.  Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis , 2019, CIKM.

[26]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[27]  Andrzej Cichocki,et al.  Tensor Decompositions for Signal Processing Applications: From two-way to multiway component analysis , 2014, IEEE Signal Processing Magazine.

[28]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[29]  Christian Jutten,et al.  Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects , 2015, Proceedings of the IEEE.

[30]  Anima Anandkumar,et al.  Online and Differentially-Private Tensor Decomposition , 2016, NIPS.

[31]  Han Qiu,et al.  Poster Abstract: Secure Data Sharing by Means of Fragmentation, Encryption, and Dispersion , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[32]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[33]  Laurence T. Yang,et al.  Secure Tensor Decomposition Using Fully Homomorphic Encryption Scheme , 2018, IEEE Transactions on Cloud Computing.

[34]  Stefan Axelsson,et al.  A review of computer simulation for fraud detection research in financial datasets , 2016, 2016 Future Technologies Conference (FTC).

[35]  Gérard Memmi,et al.  Data protection by means of fragmentation in distributed storage systems , 2015, 2015 International Conference on Protocol Engineering (ICPE) and International Conference on New Technologies of Distributed Systems (NTDS).

[36]  Kim-Kwang Raymond Choo,et al.  Jo-DPMF: Differentially private matrix factorization learning through joint optimization , 2018, Inf. Sci..

[37]  David Sánchez,et al.  Privacy-preserving data outsourcing in the cloud via semantic data splitting , 2017, Comput. Commun..

[38]  Ivan Damgård,et al.  Secure Multiparty Computation and Secret Sharing , 2015 .

[39]  James Caverlee,et al.  Tensor Completion Algorithms in Big Data Analytics , 2017, ACM Trans. Knowl. Discov. Data.

[40]  Nikos D. Sidiropoulos,et al.  Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..

[41]  Yu-An Chen,et al.  Density matrix renormalization group , 2014 .

[42]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[43]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[44]  Boris N. Khoromskij,et al.  Two-Level QTT-Tucker Format for Optimized Tensor Calculus , 2013, SIAM J. Matrix Anal. Appl..

[45]  Mehran Yazdi,et al.  Compression of Hyperspectral Images Using Discerete Wavelet Transform and Tucker Decomposition , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[46]  Boris N. Khoromskij,et al.  Tensor Numerical Methods in Scientific Computing , 2018 .

[47]  Peter Lindstrom,et al.  TTHRESH: Tensor Compression for Multidimensional Visual Data , 2018, IEEE Transactions on Visualization and Computer Graphics.

[48]  W. Hackbusch,et al.  A New Scheme for the Tensor Representation , 2009 .

[49]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[50]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[51]  Liqing Zhang,et al.  Tensor Ring Decomposition , 2016, ArXiv.

[52]  Gérard Memmi,et al.  Secure Data Sharing with Fast Access Revocation through Untrusted Clouds , 2019, 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS).

[53]  Philip S. Yu,et al.  Incremental tensor analysis: Theory and applications , 2008, TKDD.

[54]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[55]  Qing Zhu,et al.  Privacy-Preserving Tensor Decomposition Over Encrypted Data in a Federated Cloud Environment , 2020, IEEE Transactions on Dependable and Secure Computing.

[56]  G. Stewart Perturbation theory for the singular value decomposition , 1990 .

[57]  Andrzej Cichocki,et al.  Linked Component Analysis From Matrices to High-Order Tensors: Applications to Biomedical Data , 2015, Proceedings of the IEEE.

[58]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[59]  Alexander J. Smola,et al.  Fast Differentially Private Matrix Factorization , 2015, RecSys.

[60]  Jimeng Sun,et al.  Federated Tensor Factorization for Computational Phenotyping , 2017, KDD.

[61]  Alptekin Küpçü,et al.  Research issues for privacy and security of electronic health services , 2017, Future Gener. Comput. Syst..

[62]  Masashi Sugiyama,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives , 2017, Found. Trends Mach. Learn..

[63]  Sergey I. Nikolenko Synthetic Data for Deep Learning , 2019, ArXiv.

[64]  Anand D. Sarwate,et al.  Distributed Differentially Private Algorithms for Matrix and Tensor Factorization , 2018, IEEE Journal of Selected Topics in Signal Processing.

[65]  Erik G. Larsson,et al.  The Higher-Order Singular Value Decomposition: Theory and an Application [Lecture Notes] , 2010, IEEE Signal Processing Magazine.

[66]  Rajeev Motwani,et al.  Two Can Keep A Secret: A Distributed Architecture for Secure Database Services , 2005, CIDR.

[67]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[68]  M. Ramasubba Reddy,et al.  Near-Lossless Multichannel EEG Compression Based on Matrix and Tensor Decompositions , 2013, IEEE Journal of Biomedical and Health Informatics.

[69]  B. Khoromskij Tensors-structured Numerical Methods in Scientific Computing: Survey on Recent Advances , 2012 .

[70]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[71]  U. Schollwoeck The density-matrix renormalization group in the age of matrix product states , 2010, 1008.3477.

[72]  Mohamed Ali Kâafar,et al.  A differential privacy framework for matrix factorization recommender systems , 2016, User Modeling and User-Adapted Interaction.

[73]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[74]  Román Orús,et al.  Tensor networks for complex quantum systems , 2018, Nature Reviews Physics.

[75]  Andrzej Cichocki,et al.  Stable, Robust, and Super Fast Reconstruction of Tensors Using Multi-Way Projections , 2014, IEEE Transactions on Signal Processing.

[76]  Zhihui Lu,et al.  All-Or-Nothing data protection for ubiquitous communication: Challenges and perspectives , 2019, Inf. Sci..