Watch Out for the Bully! Job Interference Study on Dragonfly Network
暂无分享,去创建一个
Robert B. Ross | Misbah Mubarak | Xu Yang | Zhiling Lan | John Jenkins | R. Ross | Z. Lan | Xu Yang | M. Mubarak | John Jenkins
[1] Robert B. Ross,et al. A case study in using massively parallel simulation for extreme-scale torus network codesign , 2014, SIGSIM PADS '14.
[2] Dan Tsafrir,et al. Backfilling Using System-Generated Predictions Rather than User Runtime Estimates , 2007, IEEE Transactions on Parallel and Distributed Systems.
[3] Nan Jiang,et al. Indirect adaptive routing on large scale interconnection networks , 2009, ISCA '09.
[4] Christopher D. Carothers,et al. Warp speed: executing time warp on 1,966,080 cores , 2013, SIGSIM-PADS.
[5] Mike Higgins,et al. Cray Cascade: A scalable HPC system based on a Dragonfly network , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Christopher D. Carothers,et al. ROSS: a high-performance, low memory, modular time warp system , 2000, PADS '00.
[7] Robert B. Ross,et al. Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation , 2016, SIGSIM-PADS.
[8] V. E. Henson,et al. BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .
[9] Nan Jiang,et al. A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[10] Xu Yang,et al. Improving Batch Scheduling on Blue Gene/Q by Relaxing 5D Torus Network Allocation Constraints , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[11] John Kim,et al. Overcoming far-end congestion in large-scale networks , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[12] Kevin T. Pedretti,et al. Demonstrating improved application performance using dynamic monitoring and task mapping , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).
[13] Katherine E. Isaacs,et al. There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[14] Christopher D. Carothers,et al. Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation , 2011, 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation.
[15] Laxmikant V. Kalé,et al. Avoiding hot-spots on two-level direct networks , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[16] Torsten Hoefler,et al. Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks , 2014, HPDC '14.
[17] Robert B. Ross,et al. Enabling Parallel Simulation of Large-Scale HPC Network Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.
[18] D. Skinner,et al. Understanding the causes of performance variability in HPC workloads , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..
[19] Cyriel Minkenberg,et al. Quiet Neighborhoods: Key to Protect Job Performance Predictability , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[20] William J. Dally,et al. Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.
[21] Laxmikant V. Kalé,et al. Maximizing Throughput on a Dragonfly Network , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Robert B. Ross,et al. Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[23] William J. Dally,et al. Cost-Efficient Dragonfly Topology for Large-Scale Systems , 2009, IEEE Micro.
[24] Valerio Pascucci,et al. Analyzing Network Health and Congestion in Dragonfly-Based Supercomputers , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[25] Robert B. Ross,et al. CODES: Enabling Co-Design of Multi-Layer Exascale Storage Architectures , 2011 .