Anomaly Detection in High Performance Computers: A Vicinity Perspective
暂无分享,去创建一个
[1] Christian Engelmann,et al. Failures in Large Scale Systems: Long-term Measurement, Analysis, and Implications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] R. Service,et al. China's planned exascale computer threatens Summit's position at the top. , 2018, Science.
[3] Dhabaleswar K. Panda,et al. Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[4] Noah Treuhaft,et al. Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, , 2002 .
[5] Peter Filzmoser,et al. Dynamic log file analysis: An unsupervised cluster evolution approach for anomaly detection , 2018, Comput. Secur..
[6] Wolfgang E. Nagel,et al. Lessons Learned from Spatial and Temporal Correlation of Node Failures in High Performance Computers , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).
[7] Anthony A. Maciejewski,et al. Optimizing checkpoint intervals for reduced energy use in exascale systems , 2017, 2017 Eighth International Green and Sustainable Computing Conference (IGSC).
[8] Risto Vaarandi,et al. LogCluster - A data clustering and pattern mining algorithm for event logs , 2015, 2015 11th International Conference on Network and Service Management (CNSM).
[9] Feifei Li,et al. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.
[10] Christian Engelmann,et al. Big Data Meets HPC Log Analytics: Scalable Approach to Understanding Systems at Extreme Scale , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).
[11] Akio Watanabe,et al. Proactive failure detection learning generation patterns of large-scale network logs , 2015, 2015 11th International Conference on Network and Service Management (CNSM).
[12] Jannis Klinkenberg,et al. Data Mining-Based Analysis of HPC Center Operations , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).
[13] Bianca Schroeder,et al. Reading between the lines of failure logs: Understanding how HPC systems fail , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[14] Franck Cappello,et al. Toward Exascale Resilience: 2014 update , 2014, Supercomput. Front. Innov..
[15] Andy B. Yoo,et al. Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .
[16] Florina M. Ciorba,et al. Assessing Data Usefulness for Failure Analysis in Anonymized System Logs , 2018, 2018 17th International Symposium on Parallel and Distributed Computing (ISPDC).
[17] Al Geist,et al. A survey of high-performance computing scaling challenges , 2017, Int. J. High Perform. Comput. Appl..
[18] Risto Vaarandi,et al. An unsupervised framework for detecting anomalous messages from syslog log files , 2018, NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium.
[19] Alexander Aiken,et al. Alert Detection in System Logs , 2008, 2008 Eighth IEEE International Conference on Data Mining.
[20] Christian Engelmann,et al. A Big Data Analytics Framework for HPC Log Data: Three Case Studies Using the Titan Supercomputer Log , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).
[21] Nithin Nakka,et al. Predicting Node Failure in High Performance Computing Systems from Failure and Usage Logs , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[22] Frank Mueller,et al. Desh: deep learning for system health prediction of lead times to failure in HPC , 2018, HPDC.
[23] Florina M. Ciorba,et al. Anonymization of System Logs for Preserving Privacy and Reducing Storage , 2018 .
[24] Vipin Kumar,et al. Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.
[25] Franck Cappello,et al. LOGAIDER: A Tool for Mining Potential Correlations of HPC Log Events , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[26] Ke Zhang,et al. 2016 Ieee International Conference on Big Data (big Data) Automated It System Failure Prediction: a Deep Learning Approach , 2022 .