暂无分享,去创建一个
[1] Jim Euchner. Design , 2014, Catalysis from A to Z.
[2] Katherine A. Yelick,et al. Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..
[3] A. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.
[4] Martin Schulz,et al. PNMPI tools: a whole lot greater than the sum of their parts , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[5] Jon Louis Bentley,et al. Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.
[6] Steve Poole,et al. ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[7] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[8] Torsten Hoefler,et al. PEMOGEN: Automatic adaptive performance modeling during program runtime , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[9] Ying Qian,et al. Efficient shared memory and RDMA based collectives on multi-rail QsNetII SMP clusters , 2008, Cluster Computing.
[10] Sathish S. Vadhiyar,et al. Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[11] Jack J. Dongarra,et al. Decision Trees and MPI Collective Algorithm Selection Problem , 2007, Euro-Par.
[12] Matthew N. Anyanwu,et al. Comparative Analysis of Serial Decision Tree Classification Algorithms , 2009 .
[13] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1987, TOPL.
[14] Armin R. Mikler,et al. NetPIPE: A Network Protocol Independent Performance Evaluator , 1996 .
[15] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[16] Xin Yuan,et al. CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters , 2003, PPoPP '03.
[17] Torsten Hoefler,et al. Exploiting Offload-Enabled Network Interfaces , 2015, IEEE Micro.
[18] D. Panda,et al. High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters , 2005, HiPC.
[19] J.C. Sancho,et al. Quantifying the Potential Benefit of Overlapping Communication and Computation in Large-Scale Scientific Applications , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[20] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[21] James Demmel,et al. Statistical Models for Empirical Search-Based Performance Tuning , 2004, Int. J. High Perform. Comput. Appl..
[22] Dan Bonachea. GASNet Specification, v1.1 , 2002 .
[23] Katherine Yelick,et al. Introduction to UPC and Language Specification , 2000 .
[24] Edgar Gabriel,et al. A Tool for Optimizing Runtime Parameters of Open MPI , 2008, PVM/MPI.
[25] Kewei Cheng,et al. Feature Selection , 2016, ACM Comput. Surv..
[26] D. Martin Swany,et al. Gravel: A Communication Library to Fast Path MPI , 2008, PVM/MPI.
[27] Patricia J. Teller,et al. MPI Advisor: a Minimal Overhead Tool for MPI Library Performance Tuning , 2015, EuroMPI.
[28] Jie Wang,et al. Optimizing MPI Runtime Parameter Settings by Using Machine Learning , 2009, PVM/MPI.
[29] Thomas G. Dietterich,et al. Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms , 2008 .
[30] Robert Hecht-Nielsen,et al. Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.
[31] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[32] Roger W. Hockney,et al. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.
[33] Katherine Yelick,et al. Titanium: a high-performance Java dialect , 1998 .
[34] Torsten Hoefler,et al. Using Compiler Techniques to Improve Automatic Performance Modeling , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[35] Xin Yuan,et al. Automatic generation and tuning of MPI collective communication routines , 2005, ICS '05.
[36] Xin Yuan,et al. STAR-MPI: self tuned adaptive routines for MPI collective operations , 2006, ICS '06.
[37] Sushmitha P. Kini,et al. Fast and Scalable Barrier Using RDMA and Multicast Mechanisms for InfiniBand-Based Clusters , 2003, PVM/MPI.
[38] Lior Rokach,et al. Pattern Classification Using Ensemble Methods , 2009, Series in Machine Perception and Artificial Intelligence.
[39] Sayantan Sur,et al. Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[40] Emilio Corchado,et al. A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.
[41] Manish Gupta,et al. Compiler-controlled extraction of computation-communication overlap in MPI applications , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[42] Torsten Hoefler,et al. Principles for coordinated optimization of computation and communication in large-scale parallel systems , 2008 .
[43] Jason Duell,et al. The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..
[44] Rajeev Thakur,et al. Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.
[45] Lori Pollock,et al. Implementing an Open 64-based Tool for Improving the Performance of MPI Programs , 2008 .
[46] D. Panda,et al. Efficient Barrier and Allreduce on IBA clusters using hardware multicast and adaptive algorithms , 2004 .
[47] Nectarios Koziris,et al. A pipelined schedule to minimize completion time for loop tiling with computation and communication overlapping , 2003, J. Parallel Distributed Comput..
[48] Martin Schulz,et al. Formal analysis of MPI-based parallel programs , 2011, Commun. ACM.
[49] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[50] Darren J. Kerbyson,et al. Improving the Performance of Multiple Conjugate Gradient Solvers by Exploiting Overlap , 2008, Euro-Par.
[51] K. J. Ottenstein,et al. Data-flow graphs as an intermediate program form. , 1978 .
[52] D. Martin Swany,et al. Transformations to Parallel Codes for Communication-Computation Overlap , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[53] D. Martin Swany,et al. Photon: Remote Memory Access Middleware for High-Performance Runtime Systems , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[54] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[55] Torsten Hoefler,et al. Using automated performance modeling to find scalability bugs in complex codes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[56] J. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .
[57] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[58] D. Martin Swany,et al. Automatic MPI application transformation with ASPhALT , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[59] Kees Verstoep,et al. Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.
[60] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[61] Wei Chen,et al. Message Strip-Mining Heuristics for High Speed Networks , 2004, VECPAR.
[62] Jeffrey M. Squyres,et al. The Component Architecture of Open MPI: Enabling Third-Party Collective Algorithms* , 2005 .
[63] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[64] D. Martin Swany,et al. MPI-aware compiler optimizations for improving communication-computation overlap , 2009, ICS.
[65] G. Fagg,et al. Flexible collective communication tuning architecture applied to Open MPI , 2006 .
[66] Luis Díaz de Cerio,et al. A Method for Exploiting Communication/Computation Overlap in Hypercubes , 1998, Parallel Comput..
[67] Greg Bronevetsky,et al. Communication-Sensitive Static Dataflow for Parallel Message Passing Applications , 2009, 2009 International Symposium on Code Generation and Optimization.
[68] Jack J. Dongarra,et al. MPI Collective Algorithm Selection and Quadtree Encoding , 2006, PVM/MPI.
[69] E. Smith. Methods of Multivariate Analysis , 1997 .
[70] D. Qainlant,et al. ROSE: Compiler Support for Object-Oriented Frameworks , 1999 .
[71] Torsten Hoefler,et al. Design, Implementation, and Usage of LibNBC , 2006 .
[72] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[73] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[74] Katherine A. Yelick,et al. Optimizing bandwidth limited problems using one-sided communication and overlap , 2005, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[75] Torsten Hoefler,et al. Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication Operations , 2014, Supercomput. Front. Innov..
[76] Dipl.-Inf. Torsten Hoefler,et al. A Survey of Barrier Algorithms for Coarse Grained Supercomputers , 2005 .
[77] PattersonDavid,et al. LogP: towards a realistic model of parallel computation , 1993 .
[78] Jelena Pjesivac-Grbovic,et al. Towards Automatic and Adaptive Optimizations of MPI Collective Operations , 2007 .
[79] David E. Culler,et al. U-Net/SLE: A Java-based user-customizable virtual network interface , 1999, Sci. Program..
[80] Torsten Hoefler,et al. Optimizing a conjugate gradient solver with non-blocking collective operations , 2006, Parallel Comput..
[81] Paul D. Hovland,et al. Data-Flow Analysis for MPI Programs , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[82] Torsten Hoefler,et al. Leveraging non-blocking collective communication in high-performance applications , 2008, SPAA '08.
[83] Rolf Rabenseifner,et al. Automatic Profiling of MPI Applications with Hardware Performance Counters , 1999, PVM/MPI.
[84] Torsten Hoefler,et al. Automatic Performance Modeling of HPC Applications , 2016, Software for Exascale Computing.
[85] Thomas L. Sterling,et al. ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.
[86] Lori Pollock,et al. Program Flow Graph Construction for Static Analysis of Explicitly Parallel Message-Passing Programs , 2000 .