暂无分享,去创建一个
[1] Takayuki Okamoto,et al. The Tofu Interconnect D , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).
[2] Dan Alistarh,et al. Taming unbalanced training workloads in deep learning with partial collective operations , 2019, PPoPP.
[3] Thomas G. Robertazzi,et al. Input Versus Output Queueing on a SpaceDivision Packet Switch , 1993 .
[4] Torsten Hoefler,et al. A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[5] Torsten Hoefler,et al. Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).
[6] Alexander Shpiner,et al. Dragonfly+: Low Cost Topology for Scaling Datacenters , 2017, 2017 IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB).
[7] Guihai Chen,et al. DCQCN+: Taming Large-Scale Incast Congestion in RDMA over Ethernet Networks , 2018, 2018 IEEE 26th International Conference on Network Protocols (ICNP).
[8] Torsten Hoefler,et al. The impact of network noise at large-scale communication performance , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[9] George Bosilca,et al. Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW , 2011, EuroMPI.
[10] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[11] Vishal Misra,et al. ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY , 2016, CoNEXT.
[12] Torsten Hoefler,et al. Bandwidth-optimal all-to-all exchanges in fat tree networks , 2013, ICS '13.
[13] Jack J. Dongarra,et al. High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems , 2016, Int. J. High Perform. Comput. Appl..
[14] John M. Mellor-Crummey,et al. Understanding congestion in high performance interconnection networks using sampling , 2019, SC.
[15] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[16] Katherine E. Isaacs,et al. There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[17] Samuel P. Morgan,et al. Input Versus Output Queueing on a Space-Division Packet Switch , 1987, IEEE Trans. Commun..
[18] Ricardo Bianchini,et al. Managing Tail Latency in Datacenter-Scale File Systems Under Production Constraints , 2019, EuroSys.
[19] Ming Zhang,et al. Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..
[20] Christian E. Hopps,et al. Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.
[21] Torsten Hoefler,et al. FatPaths: Routing in Supercomputers, Data Centers, and Clouds with Low-Diameter Networks when Shortest Paths Fall Short , 2019, ArXiv.
[22] Torsten Hoefler,et al. Mitigating network noise on Dragonfly networks through application-aware routing , 2019, SC.
[23] Daniel Sánchez,et al. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).
[24] Torsten Hoefler,et al. Cost-effective diameter-two topologies: analysis and evaluation , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] Eddie Kohler,et al. Speedy transactions in multicore in-memory databases , 2013, SOSP.
[26] Nicholas J. Wright,et al. GPCNeT: designing a benchmark suite for inducing and measuring contention in HPC networks , 2019, SC.
[27] Ching-Hsing Yu,et al. Deploying a Top-100 Supercomputer for Large Parallel Workloads: the Niagara Supercomputer , 2019, PEARC.
[28] Paul Lamere,et al. Sphinx-4: a flexible open source framework for speech recognition , 2004 .
[29] Martin Wimmer. Programming models for parallel computing , 2010 .
[30] Steve Plimpton,et al. Fast parallel algorithms for short-range molecular dynamics , 1993 .
[31] Mateo Valero,et al. On-the-Fly Adaptive Routing in High-Radix Hierarchical Networks , 2012, 2012 41st International Conference on Parallel Processing.
[32] Sally Floyd,et al. TCP and explicit congestion notification , 1994, CCRV.
[33] Kevin Harms,et al. Characterization of MPI Usage on a Production Supercomputer , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[34] Charles Clos,et al. A study of non-blocking switching networks , 1953 .
[35] Abhinav Bhatele,et al. Evaluation of an Interference-free Node Allocation Policy on Fat-tree Clusters , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] George Almási. PGAS (Partitioned Global Address Space) Languages , 2011, Encyclopedia of Parallel Computing.
[37] Gottlieb,et al. Hybrid-molecular-dynamics algorithms for the numerical simulation of quantum chromodynamics. , 1987, Physical review. D, Particles and fields.
[38] Ankit Singla,et al. Jellyfish: Networking Data Centers Randomly , 2011, NSDI.
[39] Amin Vahdat,et al. A scalable, commodity data center network architecture , 2008, SIGCOMM '08.
[40] D. Roweth,et al. Cray XC ® Series Network , 2012 .
[41] Torsten Hoefler,et al. Slim Fly: A Cost Effective Low-Diameter Network Topology , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[42] Jens Domke,et al. Mitigating Inter-Job Interference Using Adaptive Flow-Aware Routing , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[43] Jack Dongarra,et al. LINPACK Users' Guide , 1987 .
[44] Robert B. Ross,et al. Watch Out for the Bully! Job Interference Study on Dragonfly Network , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[45] Shin'ichi Miura,et al. HyperX topology: first at-scale implementation and comparison to the fat-tree , 2019, SC.
[46] Yuval Tamir,et al. High-performance multiqueue buffers for VLSI communication switches , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.
[47] Eric Borch,et al. Megafly: A Topology for Exascale Systems , 2018, ISC.
[48] Jung Ho Ahn,et al. HyperX: topology, routing, and packaging of efficient large-scale networks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[49] Thomas E. Anderson,et al. High-speed switch scheduling for local-area networks , 1993, TOCS.
[50] Torsten Hoefler,et al. Scientific Benchmarking of Parallel Computing Systems Twelve ways to tell the masses when reporting performance results , 2017 .
[51] Michael Dinitz,et al. Xpander: Unveiling the Secrets of High-Performance Datacenters , 2015, HotNets.
[52] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[53] Graph Topology. MPI at Exascale , 2010 .