Understanding Application and System Performance Through System-Wide Monitoring
暂无分享,去创建一个
[1] Mark R. Fahey,et al. User Environment Tracking and Problem Detection with XALT , 2014, 2014 First International Workshop on HPC User Support Tools.
[2] Lance M. Berc,et al. Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..
[3] Brendan Gregg,et al. Systems Performance: Enterprise and the Cloud , 2013 .
[4] James C. Browne,et al. Enabling comprehensive data-driven system management for large computational facilities , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[5] Arshad Jhumka,et al. Linking Resource Usage Anomalies with System Failures from Cluster Log Data , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.
[6] James C. Browne,et al. Comprehensive Resource Use Monitoring for HPC Systems with TACC Stats , 2014, 2014 First International Workshop on HPC User Support Tools.
[7] Gregor von Laszewski,et al. Comprehensive, open‐source resource usage measurement and analysis for HPC systems , 2014, Concurr. Comput. Pract. Exp..
[8] Gang Ren,et al. Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers , 2010, IEEE Micro.
[9] Gregor von Laszewski,et al. Using XDMoD to facilitate XSEDE operations, planning and analysis , 2013, XSEDE.
[10] Si Liu,et al. System-level monitoring of floating-point performance to improve effective system utilization , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[11] James C. Browne,et al. An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats , 2014, XSEDE '14.
[12] Kevin T. Pedretti,et al. Demonstrating improved application performance using dynamic monitoring and task mapping , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).
[13] Thomas W. Tucker,et al. The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Ann C. Gentile,et al. Infrastructure for In Situ System Monitoring and Application Data Analysis , 2015, ISAV@SC.
[15] Bert J. Debusschere,et al. Ovis-2: A robust distributed architecture for scalable RAS , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[16] Gregor von Laszewski,et al. Performance metrics and auditing framework using application kernels for high‐performance computer systems , 2013, Concurr. Comput. Pract. Exp..
[17] Zheng Wang,et al. System support for automatic profiling and optimization , 1997, SOSP.
[18] Zhenbang Chen,et al. P-Tracer: Path-Based Performance Profiling in Cloud Computing Systems , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference.