Multi-class Classification of Database Workloads using PCA-SVM Classifier

A lot of companies have essentially exploited Database Management System (DBMS) to process huge amounts of data due to emerging of the information industry. Database administrators need the information of workload in order to maintain high performance DBMS. However, it has been hard to identify workload due to being diversified and complicated of database application. Therefore, the method which can automatically identify workload is required in these environments. In this paper, we propose PCA-SVM workload classifier for identifying DBMS workloads automatically. For achieving this, we collect workload data according to performance ratio while changing the resource parameters. We reduce the dimension of the feature vectors existing in the workload data by Principal Components Analysis (PCA) and classify the workload by one-against-all approach of multi-class Support Vector Machine (SVM). We experimentally select an optimal PCA-SVM workload classifier by adjusting kernel parameters for each kernel and error-tolerance threshold, C. Experimental results show that the proposed PCA-SVM workload classifier reduces dimension of the feature vector by a factor of 2/5, and its accuracy is about 7% higher than other classifiers. Moreover, the computation time for classification is also improved as much as 18 times compared with the one without dimensionality reduction.