Evaluation on Availability of Entity Information in Cyberspace

In the era of information, entity information in cyberspace plays an important role in many business scenarios. Using availability as a criterion for evaluation, more value can be mined from better entity data. However, due to the existence of invalid and erroneous data, the availability of entity information and the effect of business applications is affected. AEIC, a framework to evaluate the availability of entity information in cyberspace, is proposed in this paper. To identify specific data components and provide data availability evaluation methods, AEIC proposes the algorithm for identifying the attribute missing data and improves the existing algorithm for identifying similar duplicated data as well. In the evaluation of data availability, the indicator weights are calculated using the analytic hierarchy method. Our empirical study proves that AEIC is more effective and efficient than the existing research in the specific component identification and availability evaluation of data with different business-authorized categories of entity information in cyberspace.

[1]  Feibai Zhu,et al.  Data quality evaluation and improvement for prognostic modeling using visual assessment based data partitioning method , 2013, Comput. Ind..

[2]  Philip B. Crosby,et al.  Quality Is Free: The Art of Making Quality Certain , 1979 .

[3]  H. Wheater,et al.  Multiple objective evaluation of a simple phosphorus transfer model , 2004 .

[4]  Karl Aberer,et al.  Cost-efficient and differentiated data availability guarantees in data clouds , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[5]  Renée J. Miller,et al.  Continuous data cleaning , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[6]  Zhang Yong Approximately duplicated records examining method and its application in ETL of data warehouse , 2006 .

[7]  Salvatore J. Stolfo,et al.  Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.

[8]  Xu Di-longa Reviews on Assessment Methods Quality of Statistics , 2011 .

[9]  Gholamhossein Dastghaibyfard,et al.  Combination of data replication and scheduling algorithm for improving data availability in Data Grids , 2013, J. Netw. Comput. Appl..

[10]  Dennis M. Patten Media Exposure, Public Policy Pressure, and Environmental Disclosure: An Examination of the Impact of Tri Data Availability , 2002 .

[11]  Shi Hua-ji A Method for Detecting Approximately Duplicate Database Records in Data Warehouse , 2007 .

[12]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[13]  Raúl Gracia Tinedo,et al.  Analysis of data availability in F2F storage systems: When correlations matter , 2012, P2P.

[14]  Qiu Yue An Efficient Approach for Detecting Approximately Duplicate Database Records , 2001 .

[15]  Felix Naumann,et al.  Assessment Methods for Information Quality Criteria , 2000, IQ.

[16]  Meng Lun Li Research on the Design and Implementation of Scoring System for Large Sports Venues , 2014 .

[17]  Yu Xiaosheng,et al.  Research on Eliminating Duplicate Records Based on SNM Improved Algorithm , 2016 .

[18]  Boris Otto,et al.  Measuring Master Data Quality: Findings from an Expert Survey , 2010 .