A scalable and high performance software iSCSI implementation

In this paper we present two novel techniques for improving the performance of the Internet Small Computer Systems Interface (iSCSI) protocol, which is the basis for IP-based networked block storage today. We demonstrate that by making a few modifications to an existing iSCSI implementation, it is possible to increase the iSCSI protocol processing throughput from 1.4 Gbps to 3.6 Gbps. Our solution scales with the CPU clock speed and can be easily implemented in software using any general purpose processor without requiring specialized iSCSI protocol processing hardware. To gain an in-depth understanding of the processing costs associated with an iSCSI protocol implementation, we built an iSCSI fast path in a user-level sandbox environment. We discovered that the generation of Cyclic Redundancy Codes (CRCs) which is required for data integrity, and the data copy operations which are required for the interaction between iSCSI and TCP represent the main bottlenecks in iSCSI protocol processing. We propose two optimizations to iSCSI implementations to address these bottlenecks. Our first optimization is on the way CRCs are being calculated. We replace the industry standard algorithm proposed by Prof. Dilip Sarwate with 'Slicing-by-8' (SB8), a new algorithm capable of ideally reading arbitrarily large amounts of data at a time while keeping its memory requirement at reasonable level. Our second optimization is on the way iSCSI interacts with the TCP layer. We interleave the compute-intensive data integrity checks with the memory access-intensive data copy operations to benefit from cache effects and hardware pipeline parallelism.

[1]  Julian Satran,et al.  Internet Small Computer Systems Interface (iSCSI) , 2004, RFC.

[2]  P. Sarkar,et al.  An analysis of three gigabit networking protocols for storage area networks , 2001, Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).

[3]  Riccardo Sisto,et al.  Parallel CRC generation , 1990, IEEE Micro.

[4]  David Hung-Chang Du,et al.  Performance study of iSCSI-based storage subsystems , 2003, IEEE Commun. Mag..

[5]  Georgia Griffiths,et al.  The tea-leaf reader algorithm: an efficient implementation of CRC-16 and CRC-32 , 1987, CACM.

[6]  Kaladhar Voruganti,et al.  USENIX Association Proceedings of FAST ’ 03 : 2 nd USENIX Conference on File and Storage Technologies , 2003 .

[7]  David D. Clark,et al.  An analysis of TCP processing overhead , 1988, IEEE Communications Magazine.

[8]  Annie Foong,et al.  Performance Analysis of iSCSI and Effect of CRC Computation , 2004 .

[9]  Dirk Grunwald,et al.  A performance analysis of the iSCSI protocol , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[10]  Owen M. Fiss THE CHALLENGE AHEAD , 1959 .

[11]  Julian Satran,et al.  Features of the iSCSI protocol , 2003, IEEE Commun. Mag..

[12]  D.C. Feldmeier,et al.  Fast software implementation of error detection codes , 1995, TNET.

[13]  Marcel Waldvogel,et al.  Fast incremental CRC updates for IP over ATM networks , 2001, 2001 IEEE Workshop on High Performance Switching and Routing (IEEE Cat. No.01TH8552).

[14]  Dilip V. Sarwate Computation of cyclic redundancy checks via table look-up , 1988, CACM.

[15]  Ravi Iyer,et al.  Performance characterization of iSCSI processing in a server platform , 2005, PCCC 2005. 24th IEEE International Performance, Computing, and Communications Conference, 2005..

[16]  Prashant J. Shenoy,et al.  A Performance Comparison of NFS and iSCSI for IP-Networked Storage , 2004, FAST.

[17]  Chung-Ho Chen,et al.  A Systematic Approach for Parallel CRC Computations , 2001, J. Inf. Sci. Eng..

[18]  Michael E. Kounavis,et al.  A systematic approach to building high performance software-based CRC generators , 2005, 10th IEEE Symposium on Computers and Communications (ISCC'05).

[19]  Banu Özden,et al.  Obtaining high performance for storage outsourcing , 2001, SIGMETRICS '01.

[20]  Margo I. Seltzer,et al.  Making the Most Out of Direct-Access Network Attached Storage , 2003, FAST.

[21]  Hsiao-Keng Jerry Chu,et al.  Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.

[22]  Tenkasi V. Ramabadran,et al.  A tutorial on CRC computations , 1988, IEEE Micro.

[23]  David D. Clark,et al.  Architectural considerations for a new generation of protocols , 1990, SIGCOMM '90.

[24]  Marc A. Kaplan,et al.  A new parallel algorithm for CRC generation , 2000, 2000 IEEE International Conference on Communications. ICC 2000. Global Convergence Through Communications. Conference Record.

[25]  Kaladhar Voruganti,et al.  IP Storage: The Challenge Ahead , 2002 .

[26]  Marcel Waldvogel,et al.  Fast and flexible CRC calculation , 2004 .

[27]  Ross N. Williams A painless Guide to CRC Error Detection Algorithms , 1993 .

[28]  Greg J. Regnier,et al.  TCP onloading for data center servers , 2004, Computer.

[29]  Jeffrey S. Chase,et al.  End system optimizations for high-speed TCP , 2001, IEEE Commun. Mag..

[30]  Aram Perez,et al.  Byte-Wise CRC Calculations , 1983, IEEE Micro.