Big Data Quality Framework: Pre-Processing Data in Weather Monitoring Application

Big Data has become an imminent part of all industries and business sectors today. All organizations in any sector like energy, banking, retail, hardware, networking, etc all generate huge quantum of heterogenous data which if mined, processed and analyzed accurately can reveal immensely useful patterns for business heads to apply to generate and grow their businesses. Big Data helps in acquiring, processing and analyzing large amounts of heterogeneous data to derive valuable results. Quality of information is affected by size, speed and format in which data is generated. Hence, Quality of Big Data is of great relevance and importance. We propose addressing various aspects of the raw data to improve its quality in the pre-processing stage, as the raw data may not usable as-is. We are exploring process like Cleansing to fix as much data as feasible, Noise filters to remove bad data, as well sub-processes for Integration and Filtering along with Data Transformation/Normalization. We evaluate and profile the Big Data during acquisition stage, which is adapted to expectations to avoid cost overheads later while also improving and leading to accurate data analysis. Hence, it is imperative to improve Data quality even it is absorbed and utilized in an industry's Big Data system. In this paper, we propose a Pre-Processing Framework to address quality of data in a weather monitoring and forecasting application that also takes into account global warming parameters and raises alerts/notifications to warn users and scientists in advance.

[1]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[2]  Xiao Cheng,et al.  The rise of the Big Data , 2013 .

[3]  Srividya Kona Bansal,et al.  Towards a Semantic Extract-Transform-Load (ETL) Framework for Big Data Integration , 2014, 2014 IEEE International Congress on Big Data.

[4]  K. Cukier,et al.  The Rise of Big Data , 2013 .

[5]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[6]  Ali Sunyaev,et al.  Process-Driven Data Quality Management -- An Application of the Combined Conceptual Life Cycle Model , 2014, 2014 47th Hawaii International Conference on System Sciences.

[7]  Martin J. Shepperd,et al.  Software productivity analysis of a large data set and issues of confidentiality and data quality , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[8]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[9]  Hamidah Ibrahim,et al.  Data quality: A survey of data quality dimensions , 2012, 2012 International Conference on Information Retrieval & Knowledge Management.

[10]  Rachida Dssouli,et al.  Big Data Pre-processing: A Quality Framework , 2015, 2015 IEEE International Congress on Big Data.

[11]  Divya Tomar,et al.  A Survey on Pre-processing and Post-processing Techniques in Data Mining , 2014 .

[12]  Nan Tang,et al.  Big Data Cleaning , 2014, APWeb.