A Proposal to Maintain the Semantic Balance in Cluster-based Data Integration Systems

With the large volume of data sources on the Web, we need a system that integrates them, so that the user can query them transparently. For efficiency in queries, integration systems can group these sources in clusters according to the semantic similarity of their schemas. However, the sources have autonomy to evolve their schema, and to join or to leave the integration system at any time. This autonomy may cause a problem which we define as semantic unbalance of clusters. The semantic unbalance can compromise the formation of clusters and hence the efficiency of the submitted queries. In this paper, we propose a solution to the semantic balance of clusters in dynamic data integration systems based on self-organization. We also introduce a measure to evaluate how much the clusters are semantically unbalanced.

[1]  Biao Song,et al.  Dynamic content-based cloud data integration system with privacy and cost concern , 2011, CEAS '11.

[2]  Ana Carolina Salgado,et al.  Load balance for semantic cluster-based data integration systems , 2013, IDEAS '13.

[3]  Euripides G. M. Petrakis,et al.  A measure for cluster cohesion in semantic overlay networks , 2008, LSDS-IR '08.

[4]  Alexandra Poulovassilis,et al.  Query performance evaluation of an architecture for fine-grained integration of heterogeneous grid data sources , 2010, Future Gener. Comput. Syst..

[5]  Verena Kantere,et al.  A framework for semantic grouping in P2P databases , 2008, Inf. Syst..

[6]  Feng-Yuan Chuang,et al.  OntoZilla: An ontology-based, semi-structured, and evolutionary peer-to-peer network for information systems and services , 2009, Future Gener. Comput. Syst..

[7]  Ana Carolina Salgado,et al.  Ontology-Based Clustering in a Peer Data Management System , 2012, Int. J. Distributed Syst. Technol..

[8]  S. N. Sivanandam,et al.  A Cluster Based Replication Architecture for Load Balancing in Peer-to-Peer Content Distribution , 2010, ArXiv.

[9]  Philip A. Bernstein,et al.  Worry-free database upgrades: automated model-driven evolution of schemas and complex mappings , 2010, SIGMOD Conference.

[10]  Balakrishna R. Iyer,et al.  Online reorganization of databases , 2009, CSUR.

[11]  Rafal A. Angryk,et al.  Minimal data sets vs. synchronized data copies in a schema and data versioning system , 2011, PIKM '11.

[12]  Carlo Curino,et al.  Automating the database schema evolution process , 2012, The VLDB Journal.

[13]  Alon Y. Halevy,et al.  Bootstrapping pay-as-you-go data integration systems , 2008, SIGMOD Conference.

[14]  Paolo Manghi,et al.  A Self-organizing XML P2P Database System , 2004, SEBD.

[15]  Sebastian Skritek,et al.  Peer Data Management , 2013, Data Exchange, Information, and Streams.

[16]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[17]  Vincent Quint,et al.  Impact of XML Schema Evolution , 2011, TOIT.

[18]  Letizia Tanca,et al.  The ESTEEM platform: enabling P2P semantic collaboration through emerging collective knowledge , 2011, Journal of Intelligent Information Systems.