ALBADross: Active Learning Based Anomaly Diagnosis for Production HPC Systems
暂无分享,去创建一个
B. Kulis | V. Leung | A. Coskun | B. Schwaller | Burak Aksar | Jim Brandt | O. Aaziz | Efe Sencan
[1] Thai V. Hoang,et al. Little Help Makes a Big Difference: Leveraging Active Learning to Improve Unsupervised Time Series Anomaly Detection , 2022, ICSOC Workshops.
[2] Mohamed H. Sedky,et al. SALAD: An Exploration of Split Active Learning based Unsupervised Network Data Stream Anomaly Detection using Autoencoders , 2021 .
[3] Jorge Ortiz,et al. RLAD: Time Series Anomaly Detection through Reinforcement Learning and Active Learning , 2021, ArXiv.
[4] Xin Liu,et al. Sunway supercomputer architecture towards exascale computing: analysis and practice , 2021, Sci. China Inf. Sci..
[5] Kris Villez,et al. Active learning for anomaly detection in environmental data , 2020, Environ. Model. Softw..
[6] Nicholas J. Wright,et al. Quantifying the impact of network congestion on application performance and network metrics , 2020, 2020 IEEE International Conference on Cluster Computing (CLUSTER).
[7] Yao Wang,et al. Practical and White-Box Anomaly Detection through Unsupervised and Active Learning , 2020, 2020 29th International Conference on Computer Communications and Networks (ICCCN).
[8] Rafal A. Angryk,et al. MVTS-Data Toolkit: A Python package for preprocessing multivariate time series data , 2020, SoftwareX.
[9] Luca Benini,et al. A semisupervised autoencoder-based approach for anomaly detection in high performance computing systems , 2019, Eng. Appl. Artif. Intell..
[10] Ye Lu,et al. An Efficient Log Parsing Algorithm Based on Heuristic Rules , 2019, APPT.
[11] Vitus J. Leung,et al. HPAS: An HPC Performance Anomaly Suite for Reproducing Performance Variations , 2019, ICPP.
[12] Ayse K. Coskun,et al. Online Diagnosis of Performance Variation in HPC Systems Using Machine Learning , 2019, IEEE Transactions on Parallel and Distributed Systems.
[13] Klaus Mueller,et al. A Visual Analytics Framework for the Detection of Anomalous Call Stack Trees in High Performance Computing Applications , 2019, IEEE Transactions on Visualization and Computer Graphics.
[14] Luca Benini,et al. Anomaly Detection using Autoencoders in High Performance Computing Systems , 2018, DDC@AI*IA.
[15] Andreas W. Kempa-Liehr,et al. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python package) , 2018, Neurocomputing.
[16] Vitus J. Leung,et al. Taxonomist: Application Detection Through Rich Monitoring Data , 2018, Euro-Par.
[17] Tiago Pimentel,et al. Deep Active Learning for Anomaly Detection , 2018, 2020 International Joint Conference on Neural Networks (IJCNN).
[18] Péter Horváth,et al. modAL: A modular active learning framework for Python , 2018, ArXiv.
[19] Tie-Yan Liu,et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.
[20] Kevin Harms,et al. Run-to-run Variability on Xeon Phi based Cray XC Systems , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[21] Yijia Zhang,et al. Diagnosing Performance Variations in HPC Applications Using Machine Learning , 2017, ISC.
[22] Shi Jin,et al. Accurate anomaly detection using correlation-based time-series analysis in a core router system , 2016, 2016 IEEE International Test Conference (ITC).
[23] Andreas W. Kempa-Liehr,et al. Distributed and parallel time series feature extraction for industrial big data applications , 2016, ArXiv.
[24] Behnaz Arzani,et al. Taking the Blame Game out of Data Centers Operations with NetPoirot , 2016, SIGCOMM.
[25] Sudipto Guha,et al. Robust Random Cut Forest Based Anomaly Detection on Streams , 2016, ICML.
[26] Marília Curado,et al. Expedite feature extraction for enhanced cloud anomaly detection , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.
[27] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.
[28] Rafiul Ahad,et al. Toward Autonomic Cloud: Automatic Anomaly Detection and Resolution , 2015, 2015 International Conference on Cloud and Autonomic Computing.
[29] Mahesh Rajan,et al. Toward Rapid Understanding of Production HPC Applications and Systems , 2015, 2015 IEEE International Conference on Cluster Computing.
[30] Peter N. Brown,et al. KRIPKE - A MASSIVELY PARALLEL TRANSPORT MINI-APP , 2015 .
[31] Stephen L. Olivier,et al. Enabling Advanced Operational Analysis Through Multi-subsystem Data Integration on Trinity. , 2015 .
[32] Thomas W. Tucker,et al. The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[33] Robert B. Ross,et al. CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[34] Franck Cappello,et al. Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..
[35] Katherine E. Isaacs,et al. There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[36] Hal Finkel,et al. HACC , 2016, Commun. ACM.
[37] Bianca Schroeder,et al. Reading between the lines of failure logs: Understanding how HPC systems fail , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[38] Yaguo Lei,et al. A review on empirical mode decomposition in fault diagnosis of rotating machinery , 2013 .
[39] Nathaniel H. Hunt,et al. The Appropriate Use of Approximate Entropy and Sample Entropy with Short Data Sets , 2012, Annals of Biomedical Engineering.
[40] Gavin C. Cawley,et al. Baseline Methods for Active Learning , 2011, Active Learning and Experimental Design @ AISTATS.
[41] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[42] Vincent De Sapio,et al. Quantifying effectiveness of failure prediction and response in HPC systems: Methodology and example , 2010, 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W).
[43] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..
[44] Ling Huang,et al. Mining Console Logs for Large-Scale System Problem Detection , 2008, SysML.
[45] Christopher D. Manning,et al. Introduction to Information Retrieval , 2008, J. Assoc. Inf. Sci. Technol..
[46] Michael A. Bender,et al. Algorithmic Support for Commodity- Based Parallel Computing Systems , 2003 .
[47] H. Sebastian Seung,et al. Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.
[48] Steve Plimpton,et al. Fast parallel algorithms for short-range molecular dynamics , 1993 .
[49] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[50] P. Welch. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .
[51] Vitus J. Leung,et al. Proctor: A Semi-Supervised Performance Anomaly Diagnosis Framework for Production HPC Systems , 2021, ISC.
[52] Dong Zhou,et al. An Active Learning Method Based on Uncertainty and Complexity for Gearbox Fault Diagnosis , 2019, IEEE Access.
[53] Po-Ching Lin,et al. An Anomaly Detection Framework Based on ICA and Bayesian Classification for IaaS Platforms , 2016, KSII Trans. Internet Inf. Syst..
[54] Elisabeth Baseman,et al. Interpretable Anomaly Detection for Monitoring of High Performance Computing Systems , 2016 .
[55] Burr Settles,et al. Active Learning Literature Survey , 2009 .
[56] Dana Angluin,et al. Queries and concept learning , 1988, Machine Learning.
[57] S. Plimpton,et al. Fast Parallel Algorithms for Short-Range Molecular DynamJ-zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA , 2022 .