An optimized data integration model based on reverse cleaning for heterogeneous multi-media data

With the continuous development of information technology, various multi-media data are constantly emerging and presents the characteristics of autonomous and heterogeneous, how to integrate and analysis data more correctly and efficiently has become a challenging problem. Firstly, in order to improve the quality of the integrated data, two real-time threads combined with data adapter are used to monitor and refresh necessary updates from heterogeneous data efficiently. Once the original data has been updated, the real-time data will be loaded into the data center soon. Secondly, a data reverse cleaning method is proposed to improve the data quality. It uses the data source tree that built in the data integration process to find the location of the original data quickly after reverse cleaning. finally, a data accuracy assessment algorithm is designed for data quality assessment, which is based on Bayesian network and the path condition algorithm. Experimental results show that the quality of the integrated data significantly higher than the quality of the original data.

[1]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[2]  Gabriela Csurka,et al.  Semantic combination of textual and visual information in multimedia retrieval , 2011, ICMR.

[3]  Stavros Christodoulakis,et al.  Principles of delay-sensitive multimedia data storage retrieval , 1992, TOIS.

[4]  Hao Chen,et al.  A Heuristic Feature Selection Approach for Text Categorization by Using Chaos Optimization and Genetic Algorithm , 2013 .

[5]  Jiantao Zhou,et al.  Fractal property of generalized M-set with rational number exponent , 2013, Appl. Math. Comput..

[6]  Sang-Hyun Lee A Study on the Effect of Reverse Logistics Capability on Profits and Collaboration Satisfaction , 2013 .

[7]  Shuai Liu,et al.  Distributed Cooperative Algorithm for k - M Set with Negative Integer k by Fractal Symmetrical Property , 2014, Int. J. Distributed Sens. Networks.

[8]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[9]  Kenneth A. Ross,et al.  Proceedings of the 2009 ACM SIGMOD International Conference on Management of data , 2013, SIGMOD 2013.

[10]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[11]  Jens Ohm Transmission and Storage of Multimedia Data , 2015 .

[12]  Tomoya Enokido,et al.  Trustworthy Group Making Algorithm in Distributed Systems , 2011, Human-centric Computing and Information Sciences.

[13]  Juan Carlos Augusto,et al.  Flexible context aware interface for ambient assisted living , 2014, Human-centric Computing and Information Sciences.

[14]  Masashi Katsumata Task context-aware e-mail platform for collaborative tasks , 2014, Human-centric Computing and Information Sciences.

[15]  Simon Tjoa,et al.  Forensics Investigations of Multimedia Data: A Review of the State-of-the-Art , 2011, 2011 Sixth International Conference on IT Security Incident Management and IT Forensics.

[16]  Milind R. Naphade,et al.  Extracting semantics from audio-visual content: the final frontier in multimedia retrieval , 2002, IEEE Trans. Neural Networks.

[17]  Edward Y. Chang,et al.  Active Learning for Interactive Multimedia Retrieval , 2008, Proceedings of the IEEE.

[18]  Seungtae Hong,et al.  A New k-NN Query Processing Algorithm based on Multicasting-based Cell Expansion in Location-based Services , 2014 .

[19]  Hao Chen,et al.  A Multisource Retrospective Audit Method for Data Quality Optimization and Evaluation , 2015, Int. J. Distributed Sens. Networks.

[20]  Pramod Kumar Yadav,et al.  An exhaustive study on data mining techniques in mining of Multimedia database , 2014, 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT).

[21]  Xiaochun Cheng,et al.  Numeric characteristics of generalized M-set with its asymptote , 2014, Appl. Math. Comput..

[22]  Kuei-Fang Hsiao,et al.  Integrating body language movements in augmented reality learning environment , 2011, Human-centric Computing and Information Sciences.

[23]  Hao Chen,et al.  Optimal feature selection algorithm based on quantum-inspired clone genetic strategy in text categorization , 2009, GEC '09.