Hardware support for bulk data movement in server platforms

Bulk data movement occurs commonly in server work-loads and their performance is rather poor on today's microprocessors. We propose the use of small dedicated copy engines, and present a detailed analysis of a bulk data copy engine architecture. We describe the hardware support required to implement the copy engine and to tightly integrate it into server platforms. Our evaluation is based on an execution driven simulator that was extended with detailed models of bulk data movement engines. The simulation results show that dedicated engines are quite effective in eliminating the data movement overhead and are an attractive choice for handling bulk data in future high performance server platforms.

[1]  S. Makineni,et al.  Performance characterization of TCP/IP packet processing in commercial server workloads , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[2]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[3]  Armando P. Stettner The design and implementation of the 4.3BSD UNIX operating system , 1988 .

[4]  Yale N. Patt,et al.  An effective programmable prefetch engine for on-chip caches , 1995, MICRO 1995.

[5]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[6]  José Carlos Brustoloni,et al.  Effects of buffering semantics on I/O performance , 1996, OSDI '96.

[7]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[8]  José Carlos Brustoloni,et al.  Interoperation of copy avoidance in network and file I/O , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[9]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[10]  David A. Koufaty,et al.  Hyperthreading Technology in the Netburst Microarchitecture , 2003, IEEE Micro.

[11]  Greg J. Regnier,et al.  TCP onloading for data center servers , 2004, Computer.

[12]  D. Marr,et al.  Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[13]  Tien-Fu Chen,et al.  Alternative implementations of hybrid branch predictors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[14]  Yousef A. Khalidi,et al.  An Efficient Zero-Copy I/O Framework for UNIX , 1995 .

[15]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[16]  Samuel J. Leffler,et al.  The design and implementation of the 4.3 BSD Unix operating system , 1991, Addison-Wesley series in computer science.