An Iterative and Incremental Data Preprocessing Procedure for Improving the Risk of Big Data Project

Big data applications can enhance the market competitive advantages of enterprises and organizations and can improve people’s quality of life. However, by the impact of many factors, failure rate of big data project is higher than the IT project. In order to reduce the risk of failure, big data projects must overcome a serial of challenges. Ambiguous requirements, poor data quality, and lacking changeability and extensity will directly affect the results of big data analytics, and even cause the wrong decision, inaccurate prediction and improper planning. Making the big data projects have potential failure risk. For this, this paper applies iterative and incremental development (IID) into the data preprocessing, draws up the iterative and incremental data quality improvement (IIDQI) procedure. Applied IIDQI procedure, iterative detects and identifies the defects of data quality, and incrementally strengthen big data quality and control the factors of failure risk. Iterative inspection activities can effectively enhance data quality, communication efficiency, and requirements precision to reduce the risk of big data project failure.

[1]  Divesh Srivastava,et al.  Data quality: The other face of Big Data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[2]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[3]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[4]  Rachida Dssouli,et al.  Big Data Pre-processing: A Quality Framework , 2015, 2015 IEEE International Congress on Big Data.

[5]  Yangyong Zhu,et al.  The Challenges of Data Quality and Data Quality Assessment in the Big Data Era , 2015, Data Sci. J..

[6]  Victor R. Basili,et al.  Iterative and incremental developments. a brief history , 2003, Computer.

[7]  Ahmed Elragal,et al.  Big Data Analytics: A Literature Review Paper , 2014, ICDM.

[8]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..