A Nearest Neighbours-Based Algorithm for Big Time Series Data Forecasting

A forecasting algorithm for big data time series is presented in this work. A nearest neighbours-based strategy is adopted as the main core of the algorithm. A detailed explanation on how to adapt and implement the algorithm to handle big data is provided. Although some parts remain iterative, and consequently requires an enhanced implementation, execution times are considered as satisfactory. The performance of the proposed approach has been tested on real-world data related to electricity consumption from a public Spanish university, by using a Spark cluster.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  George E. P. Box,et al.  Time Series Analysis: Box/Time Series Analysis , 2008 .

[3]  Long Zheng,et al.  Cloud-assisted spatio-textual k nearest neighbor joins in sensor networks , 2015, 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom).

[4]  J. Ramos,et al.  Electricity Market Price Forecasting Based on Weighted Nearest Neighbors Techniques , 2007, IEEE Transactions on Power Systems.

[5]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[6]  Michael Minelli,et al.  Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses , 2012 .

[7]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.

[8]  Francisco Martinez Alvarez,et al.  Energy Time Series Forecasting Based on Pattern Sequence Similarity , 2011, IEEE Transactions on Knowledge and Data Engineering.

[9]  Thierson Couto,et al.  An Efficient and Scalable MetaFeature-based Document Classification Approach based on Massively Parallel Computing , 2015, SIGIR.

[10]  Patrick Wendell,et al.  Learning Spark: Lightning-Fast Big Data Analytics , 2015 .

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Francisco Martínez-Álvarez,et al.  A Survey on Data Mining Techniques Applied to Electricity-Related Time Series Forecasting , 2015 .

[13]  Francisco Herrera,et al.  MRPR: A MapReduce solution for prototype reduction in big data classification , 2015, Neurocomputing.

[14]  Davide Anguita,et al.  Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf , 2015, INNS Conference on Big Data.

[15]  Alicia Troncoso Lora,et al.  Discovery of motifs to forecast outlier occurrence in time series , 2011, Pattern Recognit. Lett..

[16]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .