Tools that confidently recreate I/O workloads have become a critical requirement in designing efficient storage systems for datacenters (DCs), since potential inefficiencies get aggregated over several thousand servers. Designing performance, power and cost optimized systems requires a deep understanding of target workloads, and mechanisms to effectively model different design choices. Traditional benchmarking is invalid in cloud data-stores, representative storage profiles are hard to obtain, while replaying the entire application in all storage configurations is impractical. Despite these issues, current workload generators are not comprehensive enough to accurately reproduce key aspects of real application patterns. Some of these features include spatial and temporal locality, as well as tuning the intensity of the workload to emulate different storage system behaviors. To address these limitations, we use a state diagram-based storage model, extend it to a hierarchical representation and implement a tool that consistently recreates I/O loads of DC applications. We present the design of the tool and the validation process performed against six original DC applications traces. We explore the practical applications of this methodology in two important storage challenges 1) SSD caching and 2) defragmentation benefits on enterprise storage. In both cases we observe significant storage speedup for most of the DC applications. Since knowledge of the workload’s spatial locality is necessary to model these use cases, our tool was instrumental in quantifying their performance benefits.
[1]
Sameh Elnikety,et al.
Migrating enterprise storage to SSDs: analysis of tradeoffs
,
2008
.
[2]
Irfan Ahmad.
Easy and Efficient Disk I/O Workload Characterization in VMware ESX Server
,
2007,
2007 IEEE 10th International Symposium on Workload Characterization.
[3]
Sriram Sankar,et al.
Storage characterization for unstructured data in online services applications
,
2009,
2009 IEEE International Symposium on Workload Characterization (IISWC).
[4]
Sriram Sankar,et al.
Server Engineering Insights for Large-Scale Online Services
,
2010,
IEEE Micro.
[5]
Qi Zhang,et al.
Characterization of storage workload traces from production Windows Servers
,
2008,
2008 IEEE International Symposium on Workload Characterization.
[6]
Sriram Sankar,et al.
Measuring Database Performance in Online Services: A Trace-Based Approach
,
2009,
TPCTC.
[7]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.
[8]
Sriram Sankar,et al.
Addressing the stranded power problem in datacenters using storage workload characterization
,
2010,
WOSP/SIPEW '10.