Data Integration - Challenges, Techniques and Future Directions: A Comprehensive Study

Objectives: This paper studies various query reformulation techniques, which are used to convert the intermediate schema to the targeted schema. The techniques such as Ontology based information integration and data integration languages are also reviewed. Methods/Statistical Analysis: This paper discusses the techniques used for data integration and also to resolve inconsistencies from the integrated data. Data integration techniques mainly focusing on integration of data in several levels and applying independent or unified query over the data available. Findings: Analysis of various techniques done in the paper has led to the identification of several shortcomings and scope for improvements in the available techniques. This identified research directions includes vertical enhancement of wrappers by utilizing a single unified wrapper for all the data sources. Optimizing the queries depending on the data source is also another major requirement to provide efficient and faster results reducing the data retrieval latencies. The paper also advocates other research directions that include identifying duplicates from the retrieved data and performing effective elimination strategies to reduce space consumption. Identifying conflicts and applying strategies to eliminate conflicts is another major area with a huge scope for improvement. Application/Improvements: The comprehensive survey also recommends further works in the area of data integration techniques.

[1]  Shanmugasundaram Hariharan,et al.  Synthesizing Global Association Rules from Different Data Sources Based on Desired Interestingness Metrics , 2014, Int. J. Inf. Technol. Decis. Mak..

[2]  S. Britto Ramesh Kumar,et al.  Conflict Resolution and Duplicate Elimination in Heterogeneous Datasets using Unified Data Retrieval Techniques , 2015 .

[3]  Abdelmounaam Rezgui,et al.  Automated conflict resolution in collaborative data sharing systems using community feedbacks , 2015, Inf. Sci..

[4]  Heiner Stuckenschmidt,et al.  Ontology-Based Integration of Information - A Survey of Existing Approaches , 2001, OIS@IJCAI.

[5]  Diego Calvanese,et al.  A Principled Approach to Data Integration and Reconciliation in Data Warehousing , 1999, DMDW.

[6]  Hanjo Jeong,et al.  Ontology-based Integration and Refinement of Evaluation-Committee Data from Heterogeneous Data Sources , 2015 .

[7]  Laks V. S. Lakshmanan,et al.  SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems , 1996, VLDB.

[8]  Sudha Ram,et al.  Combining schema and instance information for integrating heterogeneous data sources , 2007, Data Knowl. Eng..

[9]  Alon Y. Halevy,et al.  Piazza: data management infrastructure for semantic web applications , 2003, WWW '03.

[10]  Alon Y. Levy Logic-based techniques in data integration , 2001 .

[11]  Peng Zhao-Hui,et al.  A novel method for data conflict resolution using multiple rules , 2013 .

[12]  Aris M. Ouksel,et al.  A classification of semantic conflicts in heterogeneous database systems , 1995, J. Organ. Comput..

[13]  Amihai Motro,et al.  Data Integration: Inconsistency Detection and Resolution Based on Source Properties , 2001 .

[14]  Vijay V. Raghavan,et al.  Web information fusion: A review of the state of the art , 2008, Inf. Fusion.

[15]  Kin Keung Lai,et al.  Web warehouse - a new web information fusion tool for web mining , 2008, Inf. Fusion.

[16]  Wolfgang May,et al.  A uniform framework for integration of information from the web , 2004, Inf. Syst..

[17]  Ioana Manolescu,et al.  Agora: Living with XML and Relational , 2000, VLDB.

[18]  Felix Naumann,et al.  Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[19]  W. Litwin,et al.  An overview of the multi-database manipulation language MDSL , 1987, Proceedings of the IEEE.

[20]  Juan Manuel Dodero,et al.  Non-functional Aspects of Information Integration and Research for the Web Science , 2011, ICCS.

[21]  Jennifer Widom,et al.  Integrating and Accessing Heterogeneous Information Sources in TSIMMIS , 1994 .

[22]  Markus Tresch,et al.  A classification of multi-database languages , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[23]  Qing-Zhong Li,et al.  A novel method for data conflict resolution using multiple rules , 2013, Comput. Sci. Inf. Syst..

[24]  Laura M. Haas,et al.  Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[25]  Shanmugasundaram Hariharan,et al.  A survey on mining multiple data sources , 2013, WIREs Data Mining Knowl. Discov..

[26]  Hongjun Lu,et al.  Discovering and reconciling value conflicts for numerical data integration , 2001, Inf. Syst..

[27]  Mohamed Quafafou,et al.  Multi-data source fusion , 2008, Inf. Fusion.

[28]  Heiner Stuckenschmidt,et al.  MappingAssistant: Interactive Conflict-Resolution for Data Integration , 2011 .

[29]  Balwant Rai A Principled Approach to Data Integration And Reconciliation in Data Warehousing , 2005 .