Anonymization of System Logs for Preserving Privacy and Reducing Storage

System logs constitute valuable information for analysis and diagnosis of systems behavior. The analysis is highly time-consuming for large log volumes. For many parallel computing centers, outsourcing the analysis of system logs (syslogs) to third parties is the only option. Therefore, a general analysis and diagnosis solution is needed. Such a solution is possible only through the syslog analysis from multiple computing systems. The data within syslogs can be sensitive, thus obstructing the sharing of syslogs across institutions, third party entities, or in the public domain. This work proposes a new method for the anonymization of syslogs that employs de-identification and encoding to provide fully shareable system logs. In addition to eliminating the sensitive data within the test logs, the proposed anonymization method provides 25% performance improvement in post-processing of the anonymized syslogs, and more than 80% reduction in their required storage space.

[1]  Wolfgang E. Nagel,et al.  Analysis of Node Failures in High Performance Computers Based on System Logs , 2015 .

[2]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[3]  Thomas C. Redman,et al.  Data Driven: Profiting from Your Most Important Business Asset , 2008 .

[4]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[5]  Johannes Gehrke,et al.  Interactive anonymization of sensitive data , 2009, SIGMOD Conference.

[6]  Christof Rath Usable Privacy-Aware Logging for Unstructured Log Entries , 2016, 2016 11th International Conference on Availability, Reliability and Security (ARES).

[7]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[8]  Muzameel Ahmed,et al.  Analysis of Logs by Using Logstash , 2016, FICTA.

[9]  Elisa Bertino,et al.  TIAMAT: a Tool for Interactive Analysis of Microdata Anonymization Techniques , 2009, Proc. VLDB Endow..

[10]  Erwin Laure,et al.  Privacy-Preservation for Publishing Sample Availability Data with Personal Identifiers , 2015 .

[11]  Philippe Fournier-Viger,et al.  A survey of itemset mining , 2017, WIREs Data Mining Knowl. Discov..

[12]  Vitaly Shmatikov,et al.  Towards a Privacy Research Roadmap for the Computing Community , 2016, ArXiv.

[13]  Johann Eder,et al.  Anonymization of Data Sets with NULL Values , 2016, Trans. Large Scale Data Knowl. Centered Syst..

[14]  Tobias Pulls,et al.  Standardized Syslog Processing : Revisiting Secure Reliable Data Transfer and Message Compression , 2016 .

[15]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[16]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).