Assessing Data Usefulness for Failure Analysis in Anonymized System Logs

System logs are a valuable source of information for the analysis and understanding of systems behavior for the purpose of improving their performance. Such logs contain various types of information, including sensitive information. Information deemed sensitive can either directly be extracted from system log entries by correlation of several log entries, or can be inferred from the combination of the (non-sensitive) information contained within system logs with other logs and/or additional datasets. The analysis of system logs containing sensitive information compromises data privacy. Therefore, various anonymization techniques, such as generalization and suppression have been employed, over the years, by data and computing centers to protect the privacy of their users, their data, and the system as a whole. Privacy-preserving data resulting from anonymization via generalization and suppression may lead to significantly decreased data usefulness, thus, hindering the intended analysis for understanding the system behavior. Maintaining a balance between data usefulness and privacy preservation, therefore, remains an open and important challenge. Irreversible encoding of system logs using collision-resistant hashing algorithms, such as SHAKE-128, is a novel approach previously introduced by the authors to mitigate data privacy concerns. The present work describes a study of the applicability of the encoding approach from earlier work on the system logs of a production high performance computing system. Moreover, a metric is introduced to assess the data usefulness of the anonymized system logs to detect and identify the failures encountered in the system.

[1]  Panos Kalnis,et al.  Local and global recoding methods for anonymizing set-valued data , 2010, The VLDB Journal.

[2]  Tamir Tassa,et al.  k-Anonymization with Minimal Loss of Information , 2009, IEEE Transactions on Knowledge and Data Engineering.

[3]  P. Mayil Vel Kumar T-Closeness Integrated L-Diversity Slicing for Privacy Preserving Data Publishing , 2018 .

[4]  Wolfgang E. Nagel,et al.  Lessons Learned from Spatial and Temporal Correlation of Node Failures in High Performance Computers , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[5]  Philip S. Yu,et al.  Can the Utility of Anonymized Data be Used for Privacy Breaches? , 2009, TKDD.

[6]  Takeaki Uno,et al.  Optimization algorithm for k-anonymization of datasets with low information loss , 2018, International Journal of Information Security.

[7]  Mehmet Ercan Nergiz,et al.  Preservation of Utility through Hybrid k-Anonymization , 2013, TrustBus.

[8]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Matthias Templ Data Utility and Information Loss , 2017 .

[10]  Florina M. Ciorba,et al.  Anonymization of System Logs for Preserving Privacy and Reducing Storage , 2018 .

[11]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Li Xiong,et al.  An integrated framework for de-identifying unstructured medical data , 2009, Data Knowl. Eng..

[14]  Josep Domingo-Ferrer,et al.  Self-enforcing Collaborative Anonymization via Co-utility , 2018 .

[15]  Grigorios Loukides,et al.  Capturing data usefulness and privacy protection in K-anonymisation , 2007, SAC '07.

[16]  Wang Zhi Clustering-Based Approach for Data Anonymization , 2010 .

[17]  Fabian Prasser,et al.  Putting Statistical Disclosure Control into Practice: The ARX Data Anonymization Tool , 2015, Medical Data Privacy Handbook.

[18]  Jian Xu,et al.  Utility-based anonymization for privacy preservation with less information loss , 2006, SKDD.

[19]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.