BACULA BACKUP SOFTWARE CATALOG DATA MINING

Backup software information is a potential source for data mining: not only the unstructured stored data from all other backed-up servers, but also backup jobs metadata, which is stored in a formerly known catalog database. Data mining this database, in special, could be used in order to improve backup quality, automation, reliability, predict bottlenecks, identify risks, failure trends, and provide specific needed report information that could not be fetched from closed format property stock property backup software database. Ignoring this data mining project might be costly, with lots of unnecessary human intervention, uncoordinated work and pitfalls, such as having backup service disruption, because of insufficient planning. The specific goal of this practical paper is using Knowledge Discovery in Database Time Series, Stochastic Models and R scripts in order to predict backup storage data growth. This project could not be done with traditional closed format proprietary solutions, since it is generally impossible to read their database data from third party software because of vendor lock-in deliberate overshadow. Nevertheless, it is very feasible with Bacula: the current third most popular backup software worldwide, and open source. This paper is focused on the backup storage demand prediction problem, using the most popular prediction algorithms. Among them, Holt-Winters Model had the highest success rate for the tested data sets.

[1]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[2]  James Davidson,et al.  Econometric Modelling of the Aggregate Time-Series Relationship Between Consumers' Expenditure and Income in the United Kingdom , 1978 .

[3]  L. James Sustained Storage and Transport of Hydraulic Gold Mining Sediment in the Bear River, California , 1989 .

[4]  Rüdiger Wirth,et al.  CRISP-DM: Towards a Standard Process Model for Data Mining , 2000 .

[5]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[6]  Prajakta S. Kalekar Time series Forecasting using Holt-Winters Exponential Smoothing , 2004 .

[7]  Shuai Wang,et al.  Mining of Moving Objects from Time-Series Images and its Application to Satellite Weather Imagery , 2004, Journal of Intelligent Information Systems.

[8]  Preston de Guise,et al.  Enterprise Systems Backup and Recovery: A Corporate Insurance Policy , 2008 .

[9]  Paul Goodwin,et al.  The Holt-Winters Approach to Exponential Smoothing: 50 Years Old and Going Strong , 2010 .

[10]  Rommel N. Carvalho,et al.  Using Bayesian Networks to Identify and Prevent Split Purchases in Brazil , 2014, BMA@UAI.

[11]  Jayadeep Pati,et al.  A comparison of ARIMA, neural network and a hybrid technique for Debian bug number prediction , 2014, 2014 International Conference on Computer and Communication Technology (ICCCT).

[12]  Claudio Pizzi,et al.  Mathematical and Statistical Methods for Actuarial Sciences and Finance , 2014 .

[13]  H C Shiva Prasad,et al.  Comparing SARIMA and Holt-Winters’ forecasting accuracy with respect to Indian motorcycle industry , 2014 .

[14]  Zhipeng Tan,et al.  NSBS: Design of a Network Storage Backup System , 2015 .

[15]  Kwok-wing Chau,et al.  Improving Forecasting Accuracy of Annual Runoff Time Series Using ARIMA Based on EEMD Decomposition , 2015, Water Resources Management.

[16]  Rodrigo Lopez,et al.  Combined holt-winters and GA trained ANN approach for sensor validation and reconstruction: Application to water demand flowmeters , 2016, 2016 3rd Conference on Control and Fault-Tolerant Systems (SysTol).

[17]  Hugo M. Repolho,et al.  Air transportation demand forecast through Bagging Holt Winters methods , 2017 .

[18]  Fotios Petropoulos,et al.  Forecasting with temporal hierarchies , 2017, Eur. J. Oper. Res..