Passive Realtime Datacenter Fault Detection and Localization
暂无分享,去创建一个
Alex C. Snoeren | Arjun Roy | Hongyi Zeng | Jasmeet Bagga | A. Snoeren | Hongyi Zeng | Arjun Roy | Jasmeet Bagga
[1] Amin Vahdat,et al. Dahu: Commodity switches for direct connect data center networks , 2013, Architectures for Networking and Communications Systems.
[2] Marcos K. Aguilera,et al. Performance debugging for distributed systems of black boxes , 2003, SOSP '03.
[3] Nick McKeown,et al. I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks , 2014, NSDI.
[4] Paramvir Bahl,et al. Towards highly reliable enterprise network services via inference of multi-level dependencies , 2007, SIGCOMM '07.
[5] Vijay Mann,et al. Living on the edge: Monitoring network flows at the edge in cloud data centers , 2013, 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS).
[6] N. Duffield,et al. Network loss tomography using striped unicast probes , 2006, IEEE/ACM Transactions on Networking.
[7] Ben Y. Zhao,et al. Packet-Level Telemetry in Large Datacenter Networks , 2015, SIGCOMM.
[8] References , 1971 .
[9] Marcos K. Aguilera,et al. WAP5: black-box performance debugging for wide-area systems , 2006, WWW '06.
[10] Matthew Roughan,et al. IP forwarding anomalies and improving their detection using multiple data sources , 2004, NetT '04.
[11] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.
[12] George Varghese,et al. Gestalt: Fast, Unified Fault Localization for Networked Systems , 2014, USENIX Annual Technical Conference.
[13] Alex C. Snoeren,et al. Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..
[14] Mudhakar Srivatsa,et al. A Framework for Distributed Monitoring and Root Cause Analysis for Large IP Networks , 2009, 2009 28th IEEE International Symposium on Reliable Distributed Systems.
[15] Xin Wu,et al. NetPilot: automating datacenter network failure mitigation , 2012, SIGCOMM '12.
[16] Albert G. Greenberg,et al. IP fault localization via risk modeling , 2005, NSDI.
[17] Albert G. Greenberg,et al. Detection and Localization of Network Black Holes , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.
[18] B. Welford. Note on a Method for Calculating Corrected Sums of Squares and Products , 1962 .
[19] Behnaz Arzani,et al. Taking the Blame Game out of Data Centers Operations with NetPoirot , 2016, SIGCOMM.
[20] George Forman,et al. Automated Whole-System Diagnosis of Distributed Services Using Model-Based Reasoning , 1998 .
[21] Albert G. Greenberg,et al. Fault Localization via Risk Modeling , 2010, IEEE Transactions on Dependable and Secure Computing.
[22] Hong Liu,et al. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..
[23] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[24] Eric A. Brewer,et al. Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.
[25] Stefan Savage,et al. California fault lines: understanding the causes and impact of network failures , 2010, SIGCOMM '10.
[26] S. Savage,et al. On Failure in Managed Enterprise Networks , 2012 .
[27] Abdul Kabbani,et al. FlowBender: Flow-level Adaptive Routing for Improved Latency and Throughput in Datacenter Networks , 2014, CoNEXT.
[28] David A. Patterson,et al. Path-Based Failure and Evolution Management , 2004, NSDI.
[29] Srikanth Kandula,et al. Shrink: a tool for failure diagnosis in IP networks , 2005, MineNet '05.
[30] Armando Fox,et al. Pinpoint: problem determination in large , 2002 .
[31] Hua Chen,et al. Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.
[32] D. Zats,et al. DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.