Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters
暂无分享,去创建一个
Kesheng Wu | Wucherl Yoo | Yi Cao | Peter Nugent | Michelle S. Koo | Alex Sim | P. Nugent | A. Sim | Kesheng Wu | Wucherl Yoo | Yi Cao | Mi-Soon Koo
[1] Mona Attariyan,et al. X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.
[2] Randy H. Katz,et al. Wrangler: Predictable and Faster Jobs using Fewer Resources , 2014, SoCC.
[3] Gregory R. Ganger,et al. Diagnosing Performance Changes by Comparing Request Flows , 2011, NSDI.
[4] James C. Browne,et al. Enabling comprehensive data-driven system management for large computational facilities , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[5] Kaushik Dutta,et al. Modeling virtualized applications using machine learning techniques , 2012, VEE '12.
[6] Edward A. Lee,et al. Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..
[7] Alexander Aiken,et al. Using correlated surprise to infer shared influence , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).
[8] Jeffrey S. Chase,et al. Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.
[9] Wolfgang E. Nagel,et al. Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach , 2001, International Conference on Computational Science.
[10] Michael I. Jordan,et al. Detecting large-scale system problems by mining console logs , 2009, SOSP '09.
[11] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[12] Sam Shah,et al. Root cause detection in a service-oriented architecture , 2013, SIGMETRICS '13.
[13] Ernest E. Croner,et al. The Palomar Transient Factory: System Overview, Performance, and First Results , 2009, 0906.5350.
[14] Graham R. Nudd,et al. Performance modeling of parallel and distributed computing using PACE , 2000, Conference Proceedings of the 2000 IEEE International Performance, Computing, and Communications Conference (Cat. No.00CH37086).
[15] J. A. Hartigan,et al. A k-means clustering algorithm , 1979 .
[16] Arshad Jhumka,et al. Linking Resource Usage Anomalies with System Failures from Cluster Log Data , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.
[17] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[18] Si Liu,et al. System-level monitoring of floating-point performance to improve effective system utilization , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[19] Kesheng Wu,et al. PATHA: Performance Analysis Tool for HPC Applications , 2015, 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC).
[20] Markus Geimer,et al. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications , 2010, ICPP.
[21] Lars Koesterke,et al. PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Richard Mortier,et al. Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.
[23] Jennifer Neville,et al. Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems , 2012, NSDI.
[24] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[25] Radu Prodan,et al. A Hybrid Intelligent Method for Performance Modeling and Prediction of Workflow Activities in Grids , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.
[26] William E. Johnston,et al. The NetLogger methodology for high performance distributed systems performance analysis , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).
[27] Kesheng Wu,et al. Implementing the Palomar Transient Factory Real-Time Detection Pipeline in GLADE: Results and Observations , 2014, DNIS.
[28] Erich Strohmaier,et al. A genetic algorithms approach to modeling the performance of memory-bound computations , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[29] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[30] Gregory R. Ganger,et al. Ironmodel: robust performance models in the wild , 2008, SIGMETRICS '08.
[31] Armando Fox,et al. Fingerprinting the datacenter: automated classification of performance crises , 2010, EuroSys '10.
[32] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[33] Jarek Nabrzyski,et al. Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[34] José A. B. Fortes,et al. On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[35] Pengfei Chen,et al. CauseInfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.
[36] Wei-Ying Ma,et al. Automated known problem diagnosis with event traces , 2006, EuroSys.
[37] Sangkyum Kim,et al. ADP: automated diagnosis of performance pathologies using hardware events , 2012, SIGMETRICS '12.
[38] Rajeev Gandhi,et al. Ganesha: blackBox diagnosis of MapReduce systems , 2010, PERV.