论文信息 - IOMMU: strategies for mitigating the IOTLB bottleneck

IOMMU: strategies for mitigating the IOTLB bottleneck

The input/output memory management unit (IOMMU) was recently introduced into mainstream computer architecture when both Intel and AMD added IOMMUs to their chip-sets. An IOMMU provides memory protection from I/O devices by enabling system software to control which areas of physical memory an I/O device may access. However, this protection incurs additional direct memory access (DMA) overhead due to the required address resolution and validation. IOMMUs include an input/output translation lookaside buffer (IOTLB) to speed-up address resolution, but still every IOTLB cache-miss causes a substantial increase in DMA latency and performance degradation of DMA-intensive workloads. In this paper we first demonstrate the potential negative impact of IOTLB cache-misses on workload performance. We then propose both system software and hardware enhancements to reduce IOTLB miss rate and accelerate address resolution. These enhancements can lead to a reduction of over 60% in IOTLB miss-rate for common I/O intensive workloads.

[1] Scott Devine,et al. Disco: running commodity operating systems on scalable multiprocessors , 1997, TOCS.

[2] Alan L. Cox,et al. Practical, transparent operating system support for superpages , 2002, OPSR.

[3] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[4] Jimi Xenidis,et al. Utilizing IOMMUs for Virtualization in Linux and Xen Muli , 2006 .

[5] Anand Sivasubramaniam,et al. Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks , 2002, SIGMETRICS '02.

[6] Muli Ben-Yehuda,et al. On the DMA mapping problem in direct device assignment , 2010, SYSTOR '10.

[7] Laurent Moll,et al. Systems performance measurement on PCI Pamette , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[8] Gil Neiger,et al. Intel ® Virtualization Technology for Directed I/O , 2006 .

[9] Peter Druschel,et al. Transparent operating system support for superpages , 2004 .

[10] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[11] Fabrice Bellard,et al. QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[12] Beng-Hong Lim,et al. Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor , 2001, USENIX Annual Technical Conference, General Track.

[13] Dhabaleswar K. Panda,et al. Designing Efficient Asynchronous Memory Operations Using Hardware Copy Engine: A Case Study with I/OAT , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[14] Mark D. Hill,et al. Tradeoffs in supporting two page sizes , 1992, ISCA '92.

[15] Alan L. Cox,et al. Protection Strategies for Direct Access to Virtualized I/O Devices , 2008, USENIX Annual Technical Conference.

[16] Anand Sivasubramaniam,et al. Going the distance for TLB prefetching: an application-driven study , 2002, ISCA.

[17] A. Kivity,et al. kvm : the Linux Virtual Machine Monitor , 2007 .