论文信息 - OSC: An Online Self-Configuring Big Data Framework for Optimization of QoS

OSC: An Online Self-Configuring Big Data Framework for Optimization of QoS

Big-data frameworks such as MapReduce/Hadoop or Spark have many performance-critical configuration parameters which may interact with each other in a complex way. Their optimal values for an application on a given cluster are affected by not only the application itself but also its input data. This makes offline auto-configuration approaches hard to be used in practice because the input data of an application may change at each run. To address this issue, we propose an Online Self-Configuring (OSC) approach that automatically determines the optimal parameter values for a given application. OSC synergistically integrates three key techniques. First, OSC leverages ensemble learning to build a precise performance model for a given application. Second, it quantifies the importance of the parameters and interaction intensity between them to accelerate the genetic algorithm for searching optimal configuration parameters. Third, OSC supports an incremental modeling approach to achieve low overhead of the models for online needs. These techniques allow OSC to effectively learn the characteristics of an application and optimize its performance by automatically adjusting the configurations at runtime. Our implementation of OSC atop MapReduce/Hadoop 2.6 improves performance by 60 percent on average and up to 120 percent compared with the state-of-the-art approach. Lastly, the performance benefit of an application running on OSC generally increases along with its input data size.