Security of HPC Systems: From a Log-analyzing Perspective

High Performance Computing (HPC) systems mainly focused on how to improve performances of the computing. It has competitive processing capacity both in terms of calculation speed and available memory. HPC infrastructures are valuable computing resources that need to be carefully guarded and avoid being maliciously used. Thus, vulnerabilities are quintessential issues in HPC systems due to most of jobs and resources run or stored usually are sensitive and high-profit information. In this survey, we comprehensively review securities of HPC systems from a log-analyzing perspective, including well-known attacks and widely used defenses, especially intruder detection methods. We found that log files are used for the security purposes much less than what we expected. How to use all the available log files comprehensively and employ state-ofthe-art intrusion techniques to improve the robustness of HPC systems still lies for future research. Received on 07 July 2019; accepted on 29 July 2019; published on 01 August 2019

[1]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[2]  Domenico Cotroneo,et al.  Assessing and improving the effectiveness of logs for the analysis of software faults , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[3]  Mahmut T. Kandemir,et al.  MPISec I/O: Providing Data Confidentiality in MPI-I/O , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[4]  Shilin He,et al.  Towards Automated Log Parsing for Large-Scale Log Data Analysis , 2018, IEEE Transactions on Dependable and Secure Computing.

[5]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.

[6]  George Markowsky,et al.  Survey of Supercomputer Cluster Security Issues , 2007, Security and Management.

[7]  Rajeev Gandhi,et al.  Visual, Log-Based Causal Tracing for Performance Debugging of MapReduce Systems , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[8]  Tao Li,et al.  LogSig: generating system events from raw textual logs , 2011, CIKM '11.

[9]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[10]  Steven Tuecke,et al.  Managing security in high‐performance distributed computations , 1998, Cluster Computing.

[11]  Christian Toinard,et al.  PIGA-HIPS: Protection of a shared HPC cluster , 2011 .

[12]  Mark Barnell,et al.  High-Performance Computing (HPC) and Machine Learning Demonstrated in Flight Using Agile Condor® , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[13]  Bin Wu,et al.  Log analysis in cloud computing environment with Hadoop and Spark , 2013, 2013 5th IEEE International Conference on Broadband Network & Multimedia Technology.

[14]  Jian Li,et al.  An Evaluation Study on Log Parsing and Its Use in Log Mining , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[15]  デヴィッドソン,シャノン,ヴイ,et al.  System and method of cluster management based on Hpc Architecture , 2005 .

[16]  Feifei Li,et al.  Spell: Streaming Parsing of System Event Logs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[17]  Nithin Nakka,et al.  Predicting Node Failure in High Performance Computing Systems from Failure and Usage Logs , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[18]  Andrea Bartolini,et al.  Online Fault Classification in HPC Systems through Machine Learning , 2019, Euro-Par.

[19]  Wolfgang E. Nagel,et al.  Analysis of Node Failures in High Performance Computers Based on System Logs , 2015 .

[20]  David H. Bailey,et al.  Identifying HPC codes via performance logs and machine learning , 2013, CLHS '13.

[21]  J E Garlick,et al.  Achieving Order through CHAOS: the LLNL HPC Linux Cluster Experience , 2003 .

[22]  Timothy W. Curry,et al.  Profiling and Tracing Dynamic Library Usage Via Interposition , 1994, USENIX Summer.

[23]  Robert B. Ross,et al.  A Visual Analytics Framework for Reviewing Streaming Performance Data , 2020, 2020 IEEE Pacific Visualization Symposium (PacificVis).

[24]  Özalp Babaoglu,et al.  A Holistic Approach to Log Data Analysis in High-Performance Computing Systems: The Case of IBM Blue Gene/Q , 2015, Euro-Par Workshops.

[25]  Shilin He,et al.  Experience Report: System Log Analysis for Anomaly Detection , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[26]  Sean Peisert,et al.  Security in high-performance computing environments , 2017, Commun. ACM.

[27]  Christian Engelmann,et al.  A Big Data Analytics Framework for HPC Log Data: Three Case Studies Using the Titan Supercomputer Log , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[28]  Fabrice Gadaud,et al.  An adaptive instrumented node for efficient anomalies and misuse detections in HPC environment , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[29]  Kyle Foerster,et al.  Password recovery using MPI and CUDA , 2012, 2012 19th International Conference on High Performance Computing.

[30]  Jeremy Kepner,et al.  Scalable cryptographic authentication for high performance computing , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[31]  Pingchuan Ma,et al.  Log Analysis-Based Intrusion Detection via Unsupervised Learning , 2003 .

[32]  Christian Engelmann,et al.  Big Data Meets HPC Log Analytics: Scalable Approach to Understanding Systems at Extreme Scale , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[33]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[34]  Farzad Sabahi,et al.  Cloud computing security threats and responses , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[35]  Kwan-Liu Ma,et al.  MELA: A Visual Analytics Tool for Studying Multifidelity HPC System Logs , 2019, 2019 IEEE/ACM Industry/University Joint International Workshop on Data-center Automation, Analytics, and Control (DAAC).

[36]  Xiaowen Zhang,et al.  Finding hash collisions using MPI on HPC clusters , 2017, 2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT).

[37]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[38]  Devarshi Ghoshal,et al.  Provenance from log files: a BigData problem , 2013, EDBT '13.

[39]  Gregory A. Koenig,et al.  Clusters and security: distributed security for distributed systems , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[40]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[41]  Manish Kumar,et al.  Scalable intrusion detection systems log analysis using cloud computing infrastructure , 2013, 2013 IEEE International Conference on Computational Intelligence and Computing Research.

[42]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[43]  David Pellerin,et al.  An Introduction to High Performance Computing on AWS , 2015 .

[44]  Yuriy Brun,et al.  Inferring models of concurrent systems from logs of their behavior with CSight , 2014, ICSE.

[45]  Christian Engelmann,et al.  Blue Gene/L Log Analysis and Time to Interrupt Estimation , 2009, 2009 International Conference on Availability, Reliability and Security.

[46]  Sungjun Kim,et al.  Brute-force Attacks Analysis against SSH in HPC Multi-user Service Environment , 2016 .

[47]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[48]  Zhen Liu,et al.  Attacking a High Performance Computer Cluster , 2004 .

[49]  Edward Chuah,et al.  Diagnosing the root-causes of failures from cluster log files , 2010, 2010 International Conference on High Performance Computing.

[50]  Franck Cappello,et al.  Event Log Mining Tool for Large Scale HPC Systems , 2011, Euro-Par.

[51]  Brojo Kishore Mishra,et al.  Intrusion detection systems for High Performance Computing environment , 2014, 2014 International Conference on High Performance Computing and Applications (ICHPCA).

[52]  Luca Benini,et al.  Online Anomaly Detection in HPC Systems , 2019, 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS).

[53]  Sanjay Wandhekar,et al.  Addressing security aspects for HPC infrastructure , 2018, 2018 International Conference on Information and Computer Technologies (ICICT).