Data Preprocessing Quality Management Procedure for Improving Big Data Applications Efficiency and Practicality

Diversification applications of network fully combined with the people’s daily activities and life. All network activities generate and record the large amount of data that implies the business values of enterprises and organizations. Collecting, analyzing and visualizing the large amount of data, intelligent information may be efficiently extracted. Big data applications can help enterprises enhance market competitiveness advantages, and assist government units improve the people daily life quality. However, big data collected from network and IoT (Internet of Things) environment existed many quality defects and problems to be resolved. Data quality of big data will directly impact the analysis results of big data, and may cause wrong decisions, inaccurate predication, imperfect planning and arrangements. Data preprocessing is an important procedure of big data applications. How to ensure data preprocessing tasks quality has become a concern issue of big data applications. Based on the review activities, this paper proposes the Preprocessing Tasks Quality Measurement (PTQM) model to identify the quality defects of data preprocessing tasks. Applying Data Preprocessing Quality Management (DPQM) procedure timely modifies the preprocessing tasks quality defects to increase the big data applications efficiency and practicality.