Scalability Bugs: When 100-Node Testing is Not Enough
暂无分享,去创建一个
Tanakorn Leesatapornwongsa | Jeffrey F. Lukman | Haryadi S. Gunawi | Riza O. Suminto | Huan Ke | Cesar A. Stuardo | Tanakorn Leesatapornwongsa | Huan Ke
[1] Haryadi S. Gunawi,et al. Why Does the Cloud Stop Computing?: Lessons from Hundreds of Service Outages , 2016, SoCC.
[2] Mike Hibler,et al. An integrated experimental environment for distributed systems and networks , 2002, OPSR.
[3] Tanakorn Leesatapornwongsa,et al. What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems , 2014, SoCC.
[4] Thomas F. Wenisch,et al. The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services , 2014, OSDI.
[5] Yu Luo,et al. Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems , 2014, OSDI.
[6] Amin Vahdat,et al. DieCast: Testing Distributed Systems with an Accurate Scale Model , 2008, TOCS.
[7] Michael I. Jordan,et al. Characterizing, modeling, and generating workload spikes for stateful services , 2010, SoCC '10.
[8] Yang Wang,et al. Exalt: Empowering Researchers to Evaluate Large-Scale Storage Systems , 2014, NSDI.
[9] Pallavi Joshi,et al. SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems , 2014, OSDI.
[10] Bowen Zhou,et al. Vrisha: using scaling properties of parallel programs for bug detection and localization , 2011, HPDC '11.
[11] Garth A. Gibson,et al. PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research , 2013, login Usenix Mag..
[12] Amin Vahdat,et al. To infinity and beyond: time warped network emulation , 2005, SOSP '05.
[13] Prashant Malik,et al. Cassandra: a decentralized structured storage system , 2010, OPSR.
[14] Yuanyuan Zhou,et al. Early Detection of Configuration Errors to Reduce Failure Damage , 2016, USENIX Annual Technical Conference.
[15] Tanakorn Leesatapornwongsa,et al. Limplock: understanding the impact of limpware on scale-out cloud systems , 2013, SoCC.
[16] Torsten Hoefler,et al. Using automated performance modeling to find scalability bugs in complex codes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[17] David Wolinsky,et al. Heading Off Correlated Failures through Independence-as-a-Service , 2014, OSDI.
[18] David E. Culler,et al. SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.
[19] Silas Boyd-Wickizer,et al. Securing Distributed Systems with Information Flow Control , 2008, NSDI.
[20] Yingwei Luo,et al. Failure Recovery: When the Cure Is Worse Than the Disease , 2013, HotOS.
[21] John K. Ousterhout. Is scale your enemy, or is scale your friend?: technical perspective , 2011, CACM.
[22] Srinath T. V. Setty,et al. IronFleet: proving practical distributed systems correct , 2015, SOSP.
[23] Martin Schulz,et al. Debugging high-performance computing applications at massive scales , 2015, Commun. ACM.
[24] Shan Lu,et al. TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems , 2016, ASPLOS.
[25] Tanakorn Leesatapornwongsa,et al. The Case for Drill-Ready Cloud Computing , 2014, SoCC.
[26] Marcos K. Aguilera,et al. Performance debugging for distributed systems of black boxes , 2003, SOSP '03.