THE EFFECT OF THE DATA STORAGE TECHNIQUES ON THE ANALYSIS PERFORMANCE IN THE DATA MINING STUDIES: A SAMPLE APPLICATION WITH WEKA

In this study, the data storage techniques which are necessary to perform the data mining applications have been anaysed. In the data mining applications, the data are processed through some stages until the analysis stage. The proper performance of these stages affects both the accuracy of the analysis and the performance. Also, the analyses should be carried out the with the appropriate hardware. The effective use of the physical memory and the processor capacity bears a lot of importance in terms of analysis period and cost. In the study, database and file environment, which are two types of the data, which are to be processed through the data mining applications, have been examined. Some analyses have been carried out on a sample data cluster for this operation. It has been searched whether there is a connection between the size of the data cluster and storage environments through a gradual increase in the data analysed. For this purpose, a cluster composed of 25 million data has been put into analysis. Firstly, an analysis for two types storage environments has been performed by increasing these data with the use of Navie Bayes algorithm. The results obtained have been grouped as physical memory and the CPU usage. Secondly, the performance assessment has been examined in relation to time through the record of the analysis periods . The data obtained from the analyses in the database and the file environments have been indicated and evaluated in the final chapter of our study.

[1]  Wei-keng Liao,et al.  Performance evaluation and characterization of scalable data mining algorithms , 2004 .

[2]  Mahendra Tiwari,et al.  Performance analysis of Data Mining algorithms in Weka , 2012 .

[3]  José A. B. Fortes,et al.  Performance and memory-access characterization of data mining applications , 1998, Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization.

[4]  David L. Olson,et al.  Advanced Data Mining Techniques , 2008 .

[5]  Jiulong Shan,et al.  Understanding the Memory Performance of Data-Mining Workloads on Small, Medium, and Large-Scale CMPs Using Hardware-Software Co-simulation , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.