Topology-aware network fault influence domain analysis
暂无分享,去创建一个
[1] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[2] Amin Vahdat,et al. Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.
[3] Ishai Menache,et al. Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can , 2015, SIGCOMM.
[4] Philip Heidelberger,et al. Blue Gene/L torus interconnection network , 2005, IBM J. Res. Dev..
[5] Haixun Wang,et al. Online Anomaly Prediction for Robust Cluster Systems , 2009, 2009 IEEE 25th International Conference on Data Engineering.
[6] Franck Cappello,et al. Toward Exascale Resilience: 2014 update , 2014, Supercomput. Front. Innov..
[7] Henri Casanova,et al. Swap-And-Randomize: A Method for Building Low-Latency HPC Interconnects , 2015, IEEE Transactions on Parallel and Distributed Systems.
[8] Song Fu,et al. Failure-aware resource management for high-availability computing clusters with distributed virtual machines , 2010, J. Parallel Distributed Comput..
[9] Canqun Yang,et al. MilkyWay-2 supercomputer: system and application , 2014, Frontiers of Computer Science.
[10] Barry Pangrle. News on Energy-Efficient Large-Scale Computing , 2016 .
[11] Gen Li,et al. Iaso: an autonomous fault-tolerant management system for supercomputers , 2014, Frontiers of Computer Science.
[12] Toshiyuki Shimizu,et al. Tofu: A 6D Mesh/Torus Interconnect for Exascale Computers , 2009, Computer.
[13] Amin Vahdat,et al. A scalable, commodity data center network architecture , 2008, SIGCOMM '08.
[14] Henri Casanova,et al. Skywalk: A Topology for HPC Networks with Low-Delay Switches , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[15] Mario Gerla,et al. On the Topological Design of Distributed Computer Networks , 1977, IEEE Trans. Commun..
[16] Saurabh Gupta,et al. Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[17] Yun Zhou,et al. The Reliability Wall for Exascale Supercomputing , 2012, IEEE Transactions on Computers.
[18] Kash Barker,et al. Resilience-based network component importance measures , 2013, Reliab. Eng. Syst. Saf..
[19] Gabriel Antoniu,et al. Chronos: Failure-aware scheduling in shared Hadoop clusters , 2015, 2015 IEEE International Conference on Big Data (Big Data).
[20] Yi Zheng,et al. The TH Express high performance interconnect networks , 2014, Frontiers of Computer Science.