Making Problem Diagnosis Work for Large-Scale, Production Storage Systems
暂无分享,去创建一个
[1] Rajeev Gandhi,et al. Draco: Statistical diagnosis of chronic problems in large distributed systems , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[2] Armando Fox,et al. Detecting application-level failures in component-based Internet services , 2005, IEEE Transactions on Neural Networks.
[3] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[4] Amin Vahdat,et al. Pip: Detecting the Unexpected in Distributed Systems , 2006, NSDI.
[5] Rajeev Gandhi,et al. Ganesha: blackBox diagnosis of MapReduce systems , 2010, PERV.
[6] Bianca Schroeder,et al. A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.
[7] Armando Fox,et al. Fingerprinting the datacenter: automated classification of performance crises , 2010, EuroSys '10.
[8] Marcos K. Aguilera,et al. Performance debugging for distributed systems of black boxes , 2003, SOSP '03.
[9] Chris Newman,et al. Date and Time on the Internet: Timestamps , 2002, RFC.
[10] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.
[11] Rajeev Gandhi,et al. Black-Box Problem Diagnosis in Parallel File Systems , 2010, FAST.
[12] Richard Mortier,et al. Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.
[13] Barton P. Miller,et al. The Paradyn Parallel Performance Measurement Tool , 1995, Computer.
[14] Vanish Talwar,et al. Online detection of utility cloud anomalies using metric distributions , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.
[15] C. Pipper,et al. [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.
[16] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[17] Kwan-Liu Ma,et al. Visual analysis of I/O system behavior for high-end computing , 2011, LSAP '11.
[18] Robert Latham,et al. I/O performance challenges at leadership scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[19] Robert Latham,et al. 24/7 Characterization of petascale I/O workloads , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[20] D. Freedman,et al. On the histogram as a density estimator:L2 theory , 1981 .
[21] Robert Latham,et al. Understanding and improving computational science storage access through continuous characterization , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).
[22] Rajeev Gandhi,et al. Theia: Visual Signatures for Problem Diagnosis in Large Hadoop Clusters , 2012, LISA.
[23] Nikolaj Bjørner,et al. Latent fault detection in large scale services , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[24] Eric A. Brewer,et al. Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.
[25] Bianca Schroeder,et al. Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.
[26] Haifeng Chen,et al. PeerWatch: a fault detection and diagnosis tool for virtualized consolidation systems , 2010, ICAC '10.
[27] Herbert A. Sturges,et al. The Choice of a Class Interval , 1926 .