Research and Implementation of the Platform for Analyzing Data Quality

With more and more redundant and dirty data accumulating in information systems nowadays, the problem of data quality is getting increasingly urgent. People usually analyze the data quality through the tools provided by the database management systems, which however bring much inconvenience and inefficiency. This article introduces a novel integrated platform for the data quality analysis, which loads, compares, and verifies the business data through the predefined regular expressions and the metamodel. Moreover, it presents a detailed analysis indexes containing the rules for evaluating the data quality. The implementation conducted in the labor market information system proves that the platform is quite applicable for the data quality analysis.

[1]  Stuart E. Madnick,et al.  Data quality requirements analysis and modeling , 2011, Proceedings of IEEE 9th International Conference on Data Engineering.

[2]  Stamatis Vassiliadis,et al.  Regular expression matching for reconfigurable packet inspection , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[3]  Xu Hong-bing Data Quality Analysis and Application , 2007 .

[4]  Xiaofang Zhou,et al.  Data Quality - The Key Success Factor for Data Driven Engineering , 2007, 2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007).

[5]  José Farinha,et al.  A Data Quality Metamodel Extension to CWM , 2007, APCCM.

[6]  Zhao Li,et al.  A fast filtering scheme for large database cleansing , 2002, CIKM '02.

[7]  Li Bo-hu Metamodel-Based Modeling Methodology Research of Complex System , 2002 .