Data Separation Scheme on Lustre Metadata Server based on Multi-stream SSD

As the price of NAND-flash storage decreases, large-scale backend distributed file systems are being constructed as all-flash storage without HDDs. Lustre distributed file system, which has been most widely used in HPC systems, is also being built only with NAND-flash SSDs. In fact, the performance of an SSD can sharply decrease due to the internal garbage collection overhead along with write amplification. Lustre provides Data-on-MDT (DoM) feature, which stores small files directly in Metadata Server (MDS) instead of Object Storage Server (OSS). Despite of its benefit on communication traffic, DoM fills Metadata Target (MDT) much faster, causing garbage collection with write amplification and drastically reduces the performance of MDT. We therefore propose a data separation scheme using multi-stream SSD to mitigate the performance degradation. A multi-stream SSD manages a stream as a group of blocks. According to the lifetime of data, we separate the physical placement of DoM data, normal metadata, and journaling data of ldiskfs. By assigning different streams upon these data types with different lifetime, garbage collection overhead can be greatly reduced. Our scheme enhances the I/O throughput of MDT by 70%, the IOPs by 81 % by preventing write amplification.