Virtualized I/O

[1]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[2]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[3]  Karl S. Hemmert,et al.  High message rate, NIC-based atomics: Design and performance considerations , 2008, 2008 IEEE International Conference on Cluster Computing.

[4]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[5]  Fabrizio Petrini,et al.  Transparent system-level migration of PGAS applications using Xen on InfiniBand , 2007, 2007 IEEE International Conference on Cluster Computing.

[6]  Karsten Schwan,et al.  Resource-Aware Distributed Stream Management Using Dynamic Overlays , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[7]  Chris Smith,et al.  An Open Grid Services Architecture Primer , 2009, Computer.

[8]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[9]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[10]  Dilma Da Silva,et al.  Libra: a library operating system for a jvm in a virtualized execution environment , 2007, VEE '07.

[11]  Patrick Th. Eugster,et al.  Type-based publish/subscribe: Concepts and experiences , 2007, TOPL.

[12]  Rupak Biswas,et al.  Impact of the Columbia Supercomputer on NASA Science and Engineering Applications , 2005, IWDC.

[13]  Alan L. Cox,et al.  Achieving 10 Gb/s using safe and transparent network interface virtualization , 2009, VEE '09.

[14]  Qian Zhang,et al.  A Compound TCP Approach for High-Speed and Long Distance Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[15]  Rolf Riesen,et al.  SUNMOS for the Intel Paragon - a brief user`s guide , 1994 .

[16]  Karsten Schwan,et al.  Lightweight Morphing Support for Evolving Middleware Data Exchanges in Distributed Applications , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[17]  Fabrizio Petrini,et al.  Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[18]  Scott Rixner,et al.  An efficient programmable 10 gigabit Ethernet network interface card , 2005, 11th International Symposium on High-Performance Computer Architecture.

[19]  Adit Ranadive,et al.  Performance implications of virtualizing multicore cluster machines , 2008, HPCVirt '08.

[20]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[21]  Sally Floyd,et al.  Simulation-based comparisons of Tahoe, Reno and SACK TCP , 1996, CCRV.

[22]  Amith R. Mamidala,et al.  Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[23]  D.E. Culler,et al.  Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[24]  L. W. Tucker,et al.  Architecture and applications of the Connection Machine , 1988, Computer.

[25]  William Gropp,et al.  Design and implementation of message-passing services for the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..

[26]  QinWei,et al.  A formal concurrency model based architecture description language for synthesis of software development tools , 2004 .

[27]  Wu-chun Feng,et al.  Asymmetric interactions in symmetric multi-core systems: analysis, enhancements and evaluation , 2008, HiPC 2008.

[28]  Marvin H. Solomon,et al.  Dense Trivalent Graphs for Processor Interconnection , 1982, IEEE Transactions on Computers.

[29]  Norman P. Jouppi,et al.  High-performance ethernet-based communications for future multi-core processors , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[30]  Charles Clos,et al.  A study of non-blocking switching networks , 1953 .

[31]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[32]  Keith D. Underwood,et al.  A preliminary analysis of the MPI queue characterisitics of several applications , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[33]  Joel H. Saltz,et al.  The virtual microscope. , 2003, IEEE transactions on information technology in biomedicine : a publication of the IEEE Engineering in Medicine and Biology Society.

[34]  Philip Heidelberger,et al.  The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer , 2008, ICS '08.

[35]  Ivan Stojmenovic,et al.  Honeycomb Networks: Topological Properties and Communication Algorithms , 1997, IEEE Trans. Parallel Distributed Syst..

[36]  Courtenay T. Vaughan,et al.  A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark , 2006, 2006 IEEE International Conference on Cluster Computing.

[37]  David B. Loveman High performance Fortran , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[38]  Harold S. Stone,et al.  Parallel Processing with the Perfect Shuffle , 1971, IEEE Transactions on Computers.

[39]  Rolf Riesen,et al.  Instruction-level simulation of a cluster at scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[40]  Brent Callaghan,et al.  NFS over RDMA , 2003, NICELI '03.

[41]  Taisuke Boku,et al.  The architecture of massively parallel processor CP-PACS , 1997, Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis.

[42]  V. Glushkov THE ABSTRACT THEORY OF AUTOMATA , 1961 .

[43]  Harold S. Stone,et al.  Dynamic Memories with Enhanced Data Access , 1972, IEEE Transactions on Computers.

[44]  Russ Miller,et al.  Data Movement Techniques for the Pyramid Computer , 1987, SIAM J. Comput..

[45]  Dharma P. Agrawal,et al.  Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.

[46]  Fred Kuhns,et al.  A remotely accessible network processor-based router for network experimentation , 2008, ANCS '08.

[47]  Karl S. Hemmert,et al.  An architecture to perform NIC based MPI matching , 2007, 2007 IEEE International Conference on Cluster Computing.

[48]  Dhabaleswar K. Panda,et al.  Nomad: migrating OS-bypass networks in virtual machines , 2007, VEE '07.

[49]  Robert E. Kahn,et al.  A Protocol for Packet Network Intercommunication , 1974 .

[50]  Sheldon B. Akers,et al.  A Group-Theoretic Model for Symmetric Interconnection Networks , 1989, IEEE Trans. Computers.

[51]  Xiaola Lin,et al.  Recursive Cube of Rings: A New Topology for Interconnection Networks , 2000, IEEE Trans. Parallel Distributed Syst..

[52]  Karsten Schwan,et al.  Service Augmentation for High End Interactive Data Services , 2005, 2005 IEEE International Conference on Cluster Computing.

[53]  Thorsten von Eicken,et al.  Evolution of the Virtual Interface Architecture , 1998, Computer.

[54]  Andrew A. Chien,et al.  Software overhead in messaging layers: where does the time go? , 1994, ASPLOS VI.

[55]  Mohan Kumar,et al.  Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes , 1992, IEEE Trans. Parallel Distributed Syst..

[56]  Ron Brightwell,et al.  Architectural specification for massively parallel computers: an experience and measurement‐based approach , 2003, Concurr. Pract. Exp..

[57]  D. Tolmie,et al.  HIPPI: simplicity yields success , 1993, IEEE Network.

[58]  Janak H. Patel,et al.  Processor-memory interconnections for multiprocessors , 1979, ISCA '79.

[59]  Samuel Thibault,et al.  Improving performance by embedding HPC applications in lightweight Xen domains , 2008, HPCVirt '08.

[60]  Jian Liu,et al.  Optical MEMS devices for telecom systems , 2003, SPIE Microtechnologies.

[61]  Wu-chun Feng,et al.  A comparison of TCP automatic tuning techniques for distributed computing , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[62]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[63]  José Duato,et al.  High-radix crossbar switches enabled by proximity communication , 2008, HiPC 2008.

[64]  Dhabaleswar K. Panda,et al.  High Performance Remote Memory Access Communication: The Armci Approach , 2006, Int. J. High Perform. Comput. Appl..

[65]  Dave Olson,et al.  Pathscale InfiniPath: a first look , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[66]  Johannes Gehrke,et al.  Cayuga: a high-performance event processing engine , 2007, SIGMOD '07.

[67]  Hyun-Wook Jin,et al.  Designing next generation data-centers with advanced communication protocols and systems services , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[68]  Wei Huang,et al.  High performance virtual machine migration with RDMA over modern interconnects , 2007, 2007 IEEE International Conference on Cluster Computing.

[69]  Ulrich Brüning,et al.  An open-source HyperTransport core , 2008, TRETS.

[70]  R. Brightwell,et al.  Design and implementation of MPI on Puma portals , 1996, Proceedings. Second MPI Developer's Conference.

[71]  Scott Pakin,et al.  The Impact of Message-buffer Alignment on Communication Performance , 2005, Parallel Process. Lett..

[72]  Franco P. Preparata,et al.  The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[73]  Muli Ben-Yehuda,et al.  Loosely Coupled TCP Acceleration Architecture , 2006, 14th IEEE Symposium on High-Performance Interconnects (HOTI'06).

[74]  Nicholas Pippenger,et al.  On Crossbar Switching Networks , 1975, IEEE Trans. Commun..

[75]  Viktor K. Prasanna,et al.  A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup , 2007 .

[76]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[77]  Johannes Gehrke,et al.  Towards Expressive Publish/Subscribe Systems , 2006, EDBT.

[78]  Rami G. Melhem,et al.  On the Feasibility of Optical Circuit Switching for High Performance Computing Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[79]  Karl S. Hemmert,et al.  A hardware acceleration unit for MPI queue processing , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[80]  Pascal Caron,et al.  Characterization of Glushkov automata , 2000, Theor. Comput. Sci..

[81]  Huai-An Lin,et al.  Estimation of the optimal performance of ASN.1/BER transfer syntax , 1993, CCRV.

[82]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[83]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[84]  Fabrizio Petrini,et al.  Hardware- and software-based collective communication on the Quadrics network , 2001, Proceedings IEEE International Symposium on Network Computing and Applications. NCA 2001.

[85]  Darren J. Kerbyson A look at application performance sensitivity to the bandwidth and latency of InfiniBand networks , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[86]  Dhiraj K. Pradhan,et al.  The De Bruijn Multiprocessor Network: A Versatile Parallel Processing and Sorting Network for VLSI , 1989, IEEE Trans. Computers.

[87]  D. Frank Hsu,et al.  Distributed Loop Computer Networks: A Survey , 1995, J. Parallel Distributed Comput..

[88]  Injong Rhee,et al.  CUBIC: a new TCP-friendly high-speed TCP variant , 2008, OPSR.

[89]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[90]  Rajgopal Kannan The KR-Benes Network: A Control-Optimal Rearrangeable Permutation Network , 2005, IEEE Trans. Computers.

[91]  William J. Dally,et al.  Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.

[92]  Patrick Crowley,et al.  Application development on hybrid systems , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[93]  Dhabaleswar K. Panda,et al.  Can user-level protocols take advantage of multi-CPU NICs? , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[94]  Keith D. Underwood,et al.  A preliminary analysis of the InfiniPath and XD1 network interfaces , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[95]  William J. Dally,et al.  Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.

[96]  Fabrizio Petrini,et al.  Accelerating Real-Time String Searching with Multicore Processors , 2008, Computer.

[97]  G. Lafontant,et al.  Packaging the Cell Broadband Engine microprocessor for supercomputer applications , 2008, 2008 58th Electronic Components and Technology Conference.

[98]  Fabrizio Petrini,et al.  Peak-Performance DFA-based String Matching on the Cell Processor , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[99]  David A. Patterson,et al.  X-Tree: A tree structured multi-processor computer architecture , 1978, ISCA '78.

[100]  Tilman Wolf,et al.  Massively Parallel Anomaly Detection in Online Network Measurement , 2008, 2008 Proceedings of 17th International Conference on Computer Communications and Networks.

[101]  Larry L. Peterson,et al.  binpac: a yacc for writing application protocol parsers , 2006, IMC '06.

[102]  B W Arden,et al.  Analysis of Chordal Ring Network , 1981, IEEE Transactions on Computers.

[103]  P. Wyckoff,et al.  EMP: Zero-Copy OS-Bypass NIC-Driven Gigabit Ethernet Message Passing , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[104]  William Gropp,et al.  MPI-2: Extending the Message-Passing Interface , 1996, Euro-Par, Vol. I.

[105]  Jason Leigh,et al.  Reliable Blast UDP : predictable high performance bulk data transfer , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[106]  Michael J. Flynn,et al.  Very high-speed computing systems , 1966 .

[107]  Cheng Jin,et al.  FAST TCP: Motivation, Architecture, Algorithms, Performance , 2006, IEEE/ACM Transactions on Networking.

[108]  Yi Huang,et al.  WS-Messenger: a Web services-based messaging system for service-oriented grid computing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[109]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[110]  Keith D. Underwood,et al.  Accelerating List Management for MPI , 2005, 2005 IEEE International Conference on Cluster Computing.

[111]  Rajkumar Buyya,et al.  High Performance Mass Storage and Parallel I/O: Technologies and Applications , 2001 .

[112]  Yoshiko Yasuda,et al.  Architecture and performance of the Hitachi SR2201 massively parallel processor system , 1997, Proceedings 11th International Parallel Processing Symposium.

[113]  Karthick Rajamani,et al.  Energy Management for Commercial Servers , 2003, Computer.

[114]  Jack J. Dongarra,et al.  The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..

[115]  Yin-Ling Liong,et al.  The Scheduled Transfer (ST) Protocol , 1999, CANPC.

[116]  Robert W. Horst,et al.  ServerNet deadlock avoidance and fractahedral topologies , 1996, Proceedings of International Conference on Parallel Processing.

[117]  Wenke Lee,et al.  Secure and Flexible Monitoring of Virtual Machines , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[118]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.