End-to-End Considerations in Unification of High-Performance IO
暂无分享,去创建一个
[1] David P. Anderson,et al. The performance of message‐passing using restricted virtual memory remapping , 1991, Softw. Pract. Exp..
[2] Chris Maeda,et al. Networking performance for microkernels , 1992, [1992] Proceedings Third Workshop on Workstation Operating Systems.
[3] Keir Fraser,et al. Arsenic: a user-accessible gigabit Ethernet interface , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).
[4] Vivek S. Pai,et al. SSDAlloc: Hybrid SSD/RAM Memory Management Made Easy , 2011, NSDI.
[5] Nick McKeown,et al. OpenFlow: enabling innovation in campus networks , 2008, CCRV.
[6] Mahadev Satyanarayanan,et al. The ITC distributed file system: principles and design , 1985, SOSP 1985.
[7] William J. Bolosky,et al. Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.
[8] Animesh Trivedi,et al. jVerbs: ultra-low latency for data center applications , 2013, SoCC.
[9] David R. Cheriton,et al. Improving Server Application Performance via Pure TCP ACK Receive Optimization , 2013, USENIX Annual Technical Conference.
[10] J. Howard Et El,et al. Scale and performance in a distributed file system , 1988 .
[11] J.M. Smith,et al. Giving applications access to Gb/s networking , 1993, IEEE Network.
[12] J. Larus,et al. Tempest and Typhoon: user-level shared memory , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[13] Kenneth C. Knowlton,et al. A fast storage allocator , 1965, CACM.
[14] David L. Black,et al. IANA Registries for the Remote Direct Data Placement (RDDP) Protocols , 2012, RFC.
[15] Brian N. Bershad,et al. Extensibility safety and performance in the SPIN operating system , 1995, SOSP.
[16] Luigi Rizzo,et al. netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.
[17] Chris I. Dalton,et al. User-space protocols deliver high performance to applications on a low-cost Gb/s LAN , 1994, SIGCOMM 1994.
[18] Ronald B. Brightwell,et al. Scalability limitations of VIA-based technologies in supporting MPI , 2000 .
[19] Jeffrey C. Mogul,et al. TCP Offload Is a Dumb Idea Whose Time Has Come , 2003, HotOS.
[20] Dhabaleswar K. Panda,et al. High performance RDMA-based design of HDFS over InfiniBand , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[21] Dhabaleswar K. Panda,et al. Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[22] Michael M. Swift,et al. Hathi: durable transactions for memory using flash , 2012, DaMoN '12.
[23] Seth Copen Goldstein,et al. Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[24] Rajesh K. Gupta,et al. Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[25] Brian N. Bershad,et al. An I/O System for Mach 3.0 , 1991, USENIX MACH Symposium.
[26] Thorsten von Eicken,et al. Incorporating Memory Management into User-Level Network Interfaces , 1997 .
[27] Animesh Trivedi,et al. A case for RDMA in clouds: turning supercomputer networking into commodity , 2011, APSys.
[28] L. Grossman. Large Receive Offload implementation in Neterion 10GbE Ethernet driver , 2010 .
[29] Dutch T. Meyer,et al. Strata: scalable high-performance storage on virtualized non-volatile memory , 2014, FAST.
[30] Mahadev Satyanarayanan,et al. Lightweight Recoverable Virtual Memory , 1993, SOSP.
[31] Roy H. Campbell,et al. Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.
[32] Peter M. Chen,et al. Free transactions with Rio Vista , 1997, SOSP.
[33] Trevor N. Mudge,et al. FlashCache: a NAND flash memory file cache for low power web servers , 2006, CASES '06.
[34] Jacob Nelson,et al. Latency-Tolerant Software Distributed Shared Memory , 2015, USENIX ATC.
[35] Sanjay Kumar,et al. System software for persistent memory , 2014, EuroSys '14.
[36] Animesh Trivedi,et al. Wimpy Nodes with 10GbE: Leveraging One-Sided Operations in Soft-RDMA to Boost Memcached , 2012, USENIX ATC.
[37] Torsten Hoefler,et al. DARE: High-Performance State Machine Replication on RDMA Networks , 2015, HPDC.
[38] C. Dalton,et al. Afterburner (network-independent card for protocols) , 1993, IEEE Network.
[39] Laxmi N. Bhuyan,et al. A new server I/O architecture for high speed networks , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[40] Shimin Chen,et al. FlashLogging: exploiting flash devices for synchronous logging performance , 2009, SIGMOD Conference.
[41] Bruce S. Davie. A host-network interface architecture for ATM , 1991, SIGCOMM '91.
[42] Srihari Makineni,et al. Architectural characterization of TCP/IP packet processing on the Pentium/spl reg/ M microprocessor , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[43] Jialin Li,et al. Towards High-Performance Application-Level Storage Management , 2014, HotStorage.
[44] Scott Rixner,et al. An efficient programmable 10 gigabit Ethernet network interface card , 2005, 11th International Symposium on High-Performance Computer Architecture.
[45] Derek McAuley,et al. Protocol and Interface for ATM LANs , 1994, J. High Speed Networks.
[46] C. C. Feldmeier. Multiplexing issues in communication system design , 1990, SIGCOMM 1990.
[47] Rajesh Gupta,et al. From ARIES to MARS: transaction support for next-generation, solid-state drives , 2013, SOSP.
[48] Peter Druschel,et al. Lazy receiver processing (LRP): a network subsystem architecture for server systems , 1996, OSDI '96.
[49] Arun Jagatheesan,et al. Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[50] Dhabaleswar K. Panda,et al. Efficient virtual interface architecture (VIA) support for the IBM SP switch-connected NT clusters , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[51] Pradeep Dubey,et al. Architecting to achieve a billion requests per second throughput on a single key-value store server platform , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[52] Milon Mackey,et al. An implementation of the Hamlyn sender-managed interface architecture , 1996, OSDI '96.
[53] Miguel Castro,et al. FaRM: Fast Remote Memory , 2014, NSDI.
[54] Hemal Shah,et al. Direct Data Placement over Reliable Transports , 2007, RFC.
[55] Jonathan M. Smith,et al. Hardware/Software Organization of a High-Performance ATM Host Interface , 1993, IEEE J. Sel. Areas Commun..
[56] Brad Fitzpatrick,et al. Distributed caching with memcached , 2004 .
[57] Greg J. Regnier,et al. TCP onloading for data center servers , 2004, Computer.
[58] M. Abadi,et al. Naiad: a timely dataflow system , 2013, SOSP.
[59] Michael M. Swift,et al. FlashVM: Virtual Memory Management on Flash , 2010, USENIX Annual Technical Conference.
[60] David Banks,et al. A High-Performance Network Architecture for a PA-RISC Workstation , 1993, IEEE J. Sel. Areas Commun..
[61] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.
[62] Michael M. Swift,et al. Aerie: flexible file-system interfaces to storage-class memory , 2014, EuroSys '14.
[63] Willy Zwaenepoel,et al. Optimizing TCP Receive Performance , 2008, USENIX ATC.
[64] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.
[65] Andrea C. Arpaci-Dusseau,et al. Transforming policies into mechanisms with infokernel , 2003, SOSP '03.
[66] David D. Clark,et al. Architectural considerations for a new generation of protocols , 1990, SIGCOMM '90.
[67] Mendel Rosenblum,et al. The design and implementation of a log-structured file system , 1991, SOSP '91.
[68] Greg J. Regnier,et al. TCP performance re-visited , 2003, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003..
[69] Larry L. Peterson,et al. RPC in the x-Kernel: evaluating new design techniques , 1989, SOSP '89.
[70] Sandia Report,et al. The Portals 4.0 Message Passing Interface , 2008 .
[71] Willy Zwaenepoel,et al. IO-Lite: a unified I/O buffering and caching system , 1999, TOCS.
[72] Mark Silberstein,et al. GPUnet , 2014, OSDI.
[73] G. Chesson,et al. Protocol engine design , 1988 .
[74] Steven Swanson,et al. QuickSAN: a storage area network for fast, distributed, solid state disks , 2013, ISCA.
[75] Vijayalakshmi Srinivasan,et al. Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.
[76] Eric A. Brewer,et al. Cluster-based scalable network services , 1997, SOSP.
[77] Shekhar Y. Borkar,et al. Supporting systolic and memory communication in iWarp , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[78] Jae-Myung Kim,et al. A case for flash memory ssd in enterprise database applications , 2008, SIGMOD Conference.
[79] Ravishankar K. Iyer,et al. Addressing TCP/IP processing challenges using the IA and IXP processors , 2003 .
[80] David L Tennenhouse. Layered Multiplexing Considered Harmful , 2008 .
[81] Alexandros Labrinidis,et al. Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..
[82] Christian F. Tschudin,et al. Flexible protocol stacks , 1991, SIGCOMM '91.
[83] Ronald G. Dreslinski,et al. Performance analysis of system overheads in TCP/IP workloads , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[84] Russel Sandberg,et al. The Sun Network Filesystem: Design, Implementation and Experience , 2001 .
[85] Andrea C. Arpaci-Dusseau,et al. Deploying Safe User-Level Network Services with icTCP , 2004, OSDI.
[86] Larry L. Peterson,et al. Making paths explicit in the Scout operating system , 1996, OSDI '96.
[87] Katerina J. Argyraki,et al. RouteBricks: exploiting parallelism to scale software routers , 2009, SOSP '09.
[88] Margo I. Seltzer,et al. Structure and Performance of the Direct Access File System , 2002, USENIX ATC, General Track.
[89] Thomas R. Gross,et al. RStore: A Direct-Access DRAM-based Data Store , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.
[90] Hyeontaek Lim,et al. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.
[91] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[92] Parag Agrawal,et al. The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.
[93] Peter Steenkiste. A systematic approach to host interface design for high-speed networks , 1994, Computer.
[94] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[95] Sayantan Sur,et al. Early Evaluation of Scalable Fabric Interface for PGAS Programming Models , 2014, PGAS.
[96] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.
[97] David Flynn,et al. DFS: A file system for virtualized flash storage , 2010, TOS.
[98] Ricardo Bianchini,et al. The MIT Alewife machine: architecture and performance , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[99] Amin Vahdat,et al. Chronos: predictable low latency for data center applications , 2012, SoCC '12.
[100] K. K. Ramakrishnan,et al. Eliminating receive livelock in an interrupt-driven kernel , 1996, TOCS.
[101] Brian Zill,et al. Protocol implementation on the Nectar Communication Processor , 1990, SIGCOMM 1990.
[102] Philippe Bonnet,et al. I/O Speculation for the Microsecond Era , 2014, USENIX Annual Technical Conference.
[103] Jeffrey S. Chase,et al. End system optimizations for high-speed TCP , 2001, IEEE Commun. Mag..
[104] David Woodhouse,et al. JFFS : The Journalling Flash File System , 2001 .
[105] Jon Howell,et al. Flat Datacenter Storage , 2012, OSDI.
[106] Michael Burrows,et al. Performance of Firefly RPC , 1990, ACM Trans. Comput. Syst..
[107] David R. Cheriton,et al. Software-Controlled Caches in the VMP Multiprocessor , 1986, ISCA.
[108] Eric A. Brewer,et al. Remote queues: exposing message queues for optimization and atomicity , 1995, SPAA '95.
[109] John Wilkes. Hamlyn — an interface for sender- based communications , 1992 .
[110] Edoardo Biagioni. A structured TCP in standard ML. , 1994, SIGCOMM 1994.
[111] John K. Ousterhout,et al. Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.
[112] Steven Swanson,et al. Refactor, Reduce, Recycle: Restructuring the I/O Stack for the Future of Storage , 2013, Computer.
[113] Renato Recio,et al. A Remote Direct Memory Access Protocol Specification , 2007, RFC.
[114] Philippe Bonnet,et al. Linux block IO: introducing multi-queue SSD access on multi-core systems , 2013, SYSTOR '13.
[115] Rajesh K. Gupta,et al. NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.
[116] Calton Pu,et al. High Performance Sockets and RPC over Virtual Interface (VI) Architecture , 1999, CANPC.
[117] Alan L. Cox,et al. An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems , 2006, USENIX Annual Technical Conference, General Track.
[118] Peter Desnoyers,et al. Analytic Models of SSD Write Performance , 2014, TOS.
[119] Margo I. Seltzer,et al. Making the Most Out of Direct-Access Network Attached Storage , 2003, FAST.
[120] Ali G. Saidi,et al. Integrated network interfaces for high-bandwidth TCP/IP , 2006, ASPLOS XII.
[121] Mendel Rosenblum,et al. Fast crash recovery in RAMCloud , 2011, SOSP.
[122] Erich M. Nahum,et al. Cache behavior of network protocols , 1997, SIGMETRICS '97.
[123] Paul E. McKenney,et al. Efficient demultiplexing of incoming TCP packets , 1992, SIGCOMM 1992.
[124] P. Druschel,et al. Soft timers: efficient microsecond software timer support for network processing , 2000, OPSR.
[125] Rajesh K. Gupta,et al. Onyx: A Prototype Phase Change Memory Storage Array , 2011, HotStorage.
[126] James Pinkerton,et al. Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security , 2007, RFC.
[127] D. R. Cheriton,et al. VMTP: Versatile Message Transaction Protocol , 1988 .
[128] Robert Tappan Morris,et al. Improving network connection locality on multicore systems , 2012, EuroSys '12.
[129] Christoforos E. Kozyrakis,et al. IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.
[130] Terence Kelly,et al. Failure-atomic msync(): a simple and efficient mechanism for preserving the integrity of durable data , 2013, EuroSys '13.
[131] Evangelos P. Markatos,et al. Speeding up TCP/IP: faster processors are not enough , 2002, Conference Proceedings of the IEEE International Performance, Computing, and Communications Conference (Cat. No.02CH37326).
[132] Sylvia Ratnasamy,et al. SoftNIC: A Software NIC to Augment Hardware , 2015 .
[133] Luis Ceze,et al. Operating System Implications of Fast, Cheap, Non-Volatile Memory , 2011, HotOS.
[134] Thorsten von Eicken,et al. U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.
[135] Hemal Shah,et al. DA: Datamover Architecture for the Internet Small Computer System Interface (iSCSI) , 2007, RFC.
[136] Eunyoung Jeong,et al. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.
[137] Brian N. Bershad,et al. An Extensible Protocol Architecture for Application-Specific Networking , 1996, USENIX Annual Technical Conference.
[138] Michael M. Swift,et al. FlashVM: Revisiting the Virtual Memory Hierarchy , 2009, HotOS.
[139] Michael M. Swift,et al. FlashTier: a lightweight, consistent and durable storage cache , 2012, EuroSys '12.
[140] Haralampos Pozidis,et al. Trends in Storage Technologies , 2010, IEEE Data Eng. Bull..
[141] Babak Falsafi,et al. Coherent Network Interfaces for Fine-Grain Communication , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[142] Robert B. Ross,et al. Distributing the Data Plane for Remote Storage Access , 2015, HotOS.
[143] Hiroshi Motoda,et al. A Flash-Memory Based File System , 1995, USENIX.
[144] Henry M. Levy,et al. Limits to low-latency communication on high-speed networks , 1993, TOCS.
[145] Renato John Recio. Server I/O networks past, present, and future , 2003, NICELI '03.
[146] Wolfgang Rehm,et al. Providing a High-Performance VIA-Module for LAM/MPI , 2004 .
[147] Mendel Rosenblum,et al. Network Interface Design for Low Latency Request-Response Protocols , 2013, USENIX ATC.
[148] Frank Hady,et al. When poll is better than interrupt , 2012, FAST.
[149] Michael Stumm,et al. Exception-Less System Calls for Event-Driven Servers , 2011, USENIX Annual Technical Conference.
[150] Michael Wu,et al. eNVy: a non-volatile, main memory storage system , 1994, ASPLOS VI.
[151] Babak Falsafi,et al. Manycore Network Interfaces for in-memory rack-scale computing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[152] Paolo Faraboschi,et al. Operating System Support for NVM+DRAM Hybrid Main Memory , 2009, HotOS.
[153] Richard F. Rashid,et al. The Integration of Virtual Memory Management and Interprocess Communication in Accent , 1986, ACM Trans. Comput. Syst..
[154] Liviu Iftode,et al. Software support for virtual memory-mapped communication , 1996, Proceedings of International Conference on Parallel Processing.
[155] David J. Lilja,et al. High performance solid state storage under Linux , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[156] Jim Zelenka,et al. A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.
[157] A. L. Narasimha Reddy,et al. SCMFS: A file system for Storage Class Memory , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[158] Scott Shenker,et al. Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.
[159] Gustavo Alonso,et al. Server-efficient high-definition media dissemination , 2009, NOSSDAV '09.
[160] Dhabaleswar K. Panda,et al. Sockets Direct Protocol over InfiniBand in clusters: is it beneficial? , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.
[161] Robert B. Ross,et al. On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).
[162] Brian N. Bershad,et al. Protocol service decomposition for high-performance networking , 1994, SOSP '93.
[163] Jeffrey S. Chase,et al. Trapeze / IP : TCP / IP at Near-Gigabit Speeds , 1999 .
[164] Animesh Trivedi,et al. DaRPC: Data Center RPC , 2014, SoCC.
[165] David D. Clark,et al. An analysis of TCP processing overhead , 1988, IEEE Communications Magazine.
[166] Bruce Jacob,et al. The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization , 2009, ISCA '09.
[167] Alessandro Curioni,et al. Rebasing I/O for Scientific Computing: Leveraging Storage Class Memory in an IBM BlueGene/Q Supercomputer , 2014, ISC.
[168] W. Daniel Hillis,et al. The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.
[169] H. T. Kung,et al. A Host Interface Architecture for High-Speed Networks , 1992, HPN.
[170] Byung-Gon Chun,et al. Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 135 Megapipe: a New Programming Interface for Scalable Network I/o , 2022 .
[171] William I. Nowicki,et al. NFS: Network File System Protocol specification , 1989, RFC.
[172] Mendel Rosenblum,et al. It's Time for Low Latency , 2011, HotOS.
[173] Richard W. Watson,et al. Gaining efficiency in transport services by appropriate design and implementation choices , 1987, TOCS.
[174] Willy Zwaenepoel,et al. The peregrine high‐performance RPC system , 1993, Softw. Pract. Exp..
[175] Thomas E. Anderson,et al. FlexNIC: Rethinking Network DMA , 2015, HotOS.
[176] 신웅. OS I/O path optimizations for flash solid-state drives , 2017 .
[177] Erich M. Nahum,et al. Server Network Scalability and TCP Offload , 2005, USENIX Annual Technical Conference, General Track.
[178] Haixun Wang,et al. Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.
[179] Joseph Pasquale,et al. Profiling and reducing processing overheads in TCP/IP , 1996, TNET.
[180] Peter Druschel,et al. Cache and TLB Effectiveness in the Processing of Network Data , 1993 .
[181] Thomas R. Gross,et al. Unified High-Performance I/O: One Stack to Rule Them All , 2013, HotOS.
[182] Christopher Frost,et al. Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.
[183] David G. Andersen,et al. Using vector interfaces to deliver millions of IOPS from a networked key-value storage server , 2012, SoCC '12.
[184] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.
[185] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.
[186] Reynold Xin,et al. GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.
[187] Eric Anderson,et al. Efficiency matters! , 2010, OPSR.
[188] Jim Zelenka,et al. File server scaling with network-attached secure disks , 1997, SIGMETRICS '97.
[189] H. T. Kung,et al. The design of nectar: a network backplane for heterogeneous multicomputers , 1989, ASPLOS III.
[190] Larry L. Peterson,et al. Fbufs: a high-bandwidth cross-domain transfer facility , 1994, SOSP '93.
[191] Steve Scott,et al. Performance of the CRAY T3E Multiprocessor , 1997, SC.
[192] Kai Li,et al. Protected, user-level DMA for the SHRIMP network interface , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[193] Joo Young Hwang,et al. F2FS: A New File System for Flash Storage , 2015, FAST.
[194] Dawson R. Engler,et al. Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.
[195] Jeffrey C. Mogul. Network Locality at the Scale of Processes , 1992, ACM Trans. Comput. Syst..
[196] Pavan Balaji,et al. Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck , 2004 .
[197] Michael Stumm,et al. FlexSC: Flexible System Call Scheduling with Exception-Less System Calls , 2010, OSDI.
[198] William Gropp,et al. Learning from the Success of MPI , 2001, HiPC.
[199] Gustavo Alonso,et al. Minimizing the Hidden Cost of RDMA , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.
[200] Charlie Johnson,et al. IBM Power Edge of Network Processor: A Wire-Speed System on a Chip , 2011, IEEE Micro.
[201] Trevor Blackwell. Speeding up protocols for small messages , 1996, SIGCOMM 1996.
[202] Muli Ben-Yehuda,et al. IsoStack - Highly Efficient Network Processing on Dedicated Cores , 2010, USENIX Annual Technical Conference.
[203] Eran Gabber,et al. The Case Against User-Level Networking , 2004 .
[204] Hemal Shah,et al. Remote Direct Memory Access (RDMA) Protocol Extensions , 2014, RFC.
[205] Michael M. Swift,et al. Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.
[206] P. Pierce,et al. The Paragon implementation of the NX message passing interface , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.
[207] David R. Cheriton,et al. The VMP network adapter board (NAB): high-performance network communication for multiprocessors , 1988, SIGCOMM 1988.
[208] Thu D. Nguyen,et al. Implementing network protocols at user level , 1993, TNET.
[209] Dana S. Henry,et al. A tightly-coupled processor-network interface , 1992, ASPLOS V.
[210] José Carlos Brustoloni,et al. Effects of buffering semantics on I/O performance , 1996, OSDI '96.
[211] David E. Culler,et al. High-performance local area communication with fast sockets , 1997 .
[212] Larry L. Peterson,et al. The x-Kernel: An Architecture for Implementing Network Protocols , 1991, IEEE Trans. Software Eng..
[213] Brian Zill,et al. Software support for outboard buffering and checksumming , 1995, SIGCOMM '95.
[214] Christopher R. Johnson,et al. PIKA: A Network Service for Multikernel Operating Systems , 2014 .
[215] Brent Callaghan,et al. NFS over RDMA , 2003, NICELI '03.
[216] Larry L. Peterson,et al. Design of the x-kernel , 1988, SIGCOMM '88.
[217] Kai Li,et al. Storage alternatives for mobile computers , 1994, OSDI '94.
[218] Irfan Ahmad,et al. vIC: Interrupt Coalescing for Virtual Machine Storage Device IO , 2011, USENIX Annual Technical Conference.
[219] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.
[220] Peter F. Corbett,et al. The Direct Access File System , 2003, FAST.
[221] Henry M. Levy,et al. Separating data and control transfer in distributed operating systems , 1994, ASPLOS VI.
[222] Bingsheng He,et al. NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.
[223] A. Gupta,et al. The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[224] Rina Panigrahy,et al. Design Tradeoffs for SSD Performance , 2008, USENIX ATC.
[225] Hsiao-Keng Jerry Chu,et al. Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.
[226] Aled Edwards,et al. Experiences implementing a high performance TCP in user-space , 1995, SIGCOMM '95.
[227] Torsten Hoefler,et al. Remote Memory Access Programming in MPI-3 , 2015, TOPC.
[228] David D. Clark,et al. The structuring of systems using upcalls , 1985, SOSP '85.
[229] Anthony Skjellum,et al. Design, implementation, and performance evaluation of MPI 3.0 on portals 4.0 , 2013, EuroMPI.
[230] Heon Young Yeom,et al. Dynamic Interval Polling and Pipelined Post I/O Processing for Low-Latency Storage Class Memory , 2013, HotStorage.
[231] Steven Swanson,et al. Providing safe, user space access to fast, solid state disks , 2012, ASPLOS XVII.
[232] Onur Mutlu,et al. Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.
[233] Katherine Yelick,et al. Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT , 2009 .
[234] Jian Xu,et al. Bankshot: caching slow storage in fast non-volatile memory , 2013, INFLOW '13.
[235] Sayantan Sur,et al. A Brief Introduction to the OpenFabrics Interfaces - A New Network API for Maximizing High Performance Application Efficiency , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.
[236] Randall R. Stewart,et al. Stream Control Transmission Protocol (SCTP) Direct Data Placement (DDP) Adaptation , 2007, RFC.
[237] William J. Dally,et al. The J-machine Multicomputer: An Architectural Evaluation , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[238] Peter Druschel,et al. Experiences with a high-speed network adaptor: a software perspective , 1994, SIGCOMM 1994.
[239] Ian Watson,et al. The Manchester prototype dataflow computer , 1985, CACM.
[240] Mark Handley,et al. Network stack specialization for performance , 2015, SIGCOMM 2015.
[241] Larry L. Peterson,et al. A language-based approach to protocol implementation , 1993, TNET.
[242] David G. Andersen,et al. Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.
[243] Timothy Roscoe,et al. Modeling NICs with Unicorn , 2013, PLOS '13.
[244] Richard P. Martin,et al. Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[245] Arkady Kanevsky,et al. Enhanced Remote Direct Memory Access (RDMA) Connection Establishment , 2012, RFC.
[246] Thomas L. Sterling,et al. BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.
[247] Jeffrey S. Chase,et al. On the elusive benefits of protocol offload , 2003, NICELI '03.
[248] Thomas F. Wenisch,et al. Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.
[249] Evangelos Eleftheriou,et al. Container Marking: Combining Data Placement, Garbage Collection and Wear Levelling for Flash , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.
[250] Michael M. Swift,et al. Storage-class memory needs flexible interfaces , 2013, APSys.
[251] Ashish Gupta,et al. The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..
[252] Joel Dylan Coburn. Providing fast and safe access to next-generation, non- volatile memories , 2012 .
[253] Todor I. Mollov,et al. Quill : Exploiting Fast Non-Volatile Memory by Transparently Bypassing the File System , 2013 .
[254] K. K. Ramakrishnan,et al. Performance Considerations in Designing Network Interfaces , 1993, IEEE J. Sel. Areas Commun..
[255] Adrian Schüpbach,et al. The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.
[256] Joseph Pasquale,et al. The importance of non-data touching processing overheads in TCP/IP , 1993, SIGCOMM 1993.
[257] David E. Culler,et al. An Implementation and Analysis of the Virtual Interface Architecture , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[258] Eddie Kohler,et al. A readable TCP in the Prolac protocol language , 1999, SIGCOMM '99.
[259] George Bosilca,et al. UCX: An Open Source Framework for HPC Network APIs and Beyond , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.
[260] Steven Swanson,et al. Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications , 2009, ASPLOS.
[261] GhemawatSanjay,et al. The Google file system , 2003 .
[262] Ren-Shuo Liu,et al. NVM duet: unified working memory and persistent store architecture , 2014, ASPLOS.
[263] Robert Grimm,et al. Application performance and flexibility on exokernel systems , 1997, SOSP.
[264] Andrea C. Arpaci-Dusseau,et al. De-indirection for flash-based SSDs with nameless writes , 2012, FAST.
[265] Jeffrey C. Mogul,et al. The packer filter: an efficient mechanism for user-level network code , 1987, SOSP '87.
[266] Orion Hodson,et al. Whole-system persistence , 2012, ASPLOS XVII.
[267] Sayantan Sur,et al. Memcached Design on High Performance RDMA Capable Interconnects , 2011, 2011 International Conference on Parallel Processing.
[268] F. Bitz,et al. Host interface design for ATM LANs , 1991, [1991] Proceedings 16th Conference on Local Computer Networks.
[269] Jian Yang,et al. Mojim: A Reliable and Highly-Available Non-Volatile Memory System , 2015, ASPLOS.
[270] Babak Falsafi,et al. Scale-out NUMA , 2014, ASPLOS.
[271] Harry Rudin,et al. A Survey of Light-Weight Protocols for High-Speed Networks , 1994 .
[272] Jeffrey C. Mogul,et al. The effect of context switches on cache performance , 1991, ASPLOS IV.
[273] Ram Huggahalli,et al. Direct cache access for high bandwidth network I/O , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[274] Timothy Roscoe,et al. Arrakis , 2014, OSDI.
[275] Dhabaleswar K. Panda,et al. Beyond block I/O: Rethinking traditional storage primitives , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[276] Jens Teubner,et al. A Spinning Join That Does Not Get Dizzy , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.
[277] David G. Andersen,et al. The Case for VOS: The Vector Operating System , 2011, HotOS.
[278] Hyeonsang Eom,et al. Optimizing the Block I/O Subsystem for Fast Storage Devices , 2014, ACM Trans. Comput. Syst..
[279] Dhruva R. Chakrabarti,et al. Implications of CPU Caching on Byte-addressable Non-Volatile Memory Programming , 2012 .
[280] Qin Jin,et al. Persistent B+-Trees in Non-Volatile Main Memory , 2015, Proc. VLDB Endow..
[281] Andrea C. Arpaci-Dusseau,et al. ANViL: Advanced Virtualization for Modern Non-Volatile Memory Devices , 2015, FAST.
[282] Antony I. T. Rowstron,et al. IOFlow: a software-defined storage architecture , 2013, SOSP.
[283] Dahlia Malkhi,et al. CORFU: A Shared Log Design for Flash Clusters , 2012, NSDI.
[284] Philip Werner Frey,et al. Zero-copy network communication: An applicability study of iWARP beyond micro benchmarks , 2010 .
[285] Jinyang Li,et al. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.