Indexing source descriptions based on defined classes

Scaling heterogeneous information systems (HIS) to thousands of sources poses particular challenges to source discovery. It requires a powerful formalism for describing the contents of the sources in a concise manner and for formulating compatible queries as well as a suitable structure for indexing and retrieving the source descriptions efficiently. We propose an extended logic-based description formalism for large-scale HIS with structured sources and a shared ontology. The formalism refines existing approaches that describe the sources by constraints on the attribute value ranges in several ways: It allows for complex, nested descriptions based on defined classes. It supports alternative descriptions to express that a source may be discovered by different combinations of constraints. Finally, it allows to adjust between positive matching, similar to keyword-based discovery, and negative matching, as used in existing logic-based approaches. We further propose the SDC-Tree for indexing such source descriptions. To allow for efficient discovery, the SDC-Tree features multidimensional indexing capabilities for the different attributes and the IS-A hierarchy of the shared ontology, but also incorporates the existence or absence of constraints. For this purpose, it supports three different types of node split operations which exploit the expressiveness of the description formalism. Therefore, we also propose a generic split algorithm which can be used with arbitrary ontologies.

[1]  Tao Gu,et al.  A peer-to-peer overlay for context information search , 2005, Proceedings. 14th International Conference on Computer Communications and Networks, 2005. ICCCN 2005..

[2]  Silvana Castano,et al.  Semantic integration of semistructured and structured data sources , 1999, SGMD.

[3]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Peter Haase,et al.  Semantic technologies for distributed information systems , 2006 .

[5]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[6]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[7]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[8]  Gade Krishna,et al.  A scalable peer-to-peer lookup protocol for Internet applications , 2012 .

[9]  Keith Ryden,et al.  OpenGIS ® Implementation Specification for Geographic information - Simple feature access - Part 1:Common architecture , 2005 .

[10]  Craig A. Knoblock,et al.  Query reformulation for dynamic information integration , 1996, Journal of Intelligent Information Systems.

[11]  Dimitris Plexousakis,et al.  Quete: Ontology-Based Query System for Distributed Sources , 2007, ADBIS.

[12]  Alfons Kemper,et al.  ObjectGlobe: Ubiquitous query processing on the Internet , 2001, The VLDB Journal.

[13]  Patrick Valduriez,et al.  Scaling Access to Heterogeneous Data Sources with DISCO , 1998, IEEE Trans. Knowl. Data Eng..

[14]  M. M. Nodine,et al.  Scalable Semantic Brokering over Dynamic Heterogeneous Data Sources in InfoSleuthTM , 2003, IEEE Trans. Knowl. Data Eng..

[15]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[16]  Abdelkader Hameurlain,et al.  Ontology-based data source localization in a structured peer-to-peer environment , 2008, IDEAS '08.

[17]  Chen Li,et al.  Using Constraints to Describe Source Contents in Data Integration Systems , 2003, IEEE Intell. Syst..

[18]  Tore Risch,et al.  Functional Data Integration in a Distributed Mediator System , 2004 .

[19]  Verena Kantere,et al.  The hyperion project: from data integration to data coordination , 2003, SGMD.

[20]  Santa Barbara Alexandria Digital Library Feature Type Thesaurus , 2002 .

[21]  Georgios Meditskos,et al.  Structural and Role-Oriented Web Service Discovery with Taxonomies in OWL-S , 2010, IEEE Transactions on Knowledge and Data Engineering.

[22]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[23]  Henning Schulzrinne,et al.  An Ontology-Based Hierarchical Peer-to-Peer Global Service Discovery System , 2007 .

[24]  Dan Suciu,et al.  The Piazza peer data management system , 2004, IEEE Transactions on Knowledge and Data Engineering.

[25]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[26]  Athman Bouguettaya,et al.  Ontological Approach for Information Discovery in Internet Databases , 2004, Distributed and Parallel Databases.

[27]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[28]  Michael R. Genesereth,et al.  Infomaster: an information integration system , 1997, SIGMOD '97.

[29]  Kurt Rothermel,et al.  Making the World Wide Space happen: New challenges for the Nexus context platform , 2009, 2009 IEEE International Conference on Pervasive Computing and Communications.

[30]  Frank Dürr,et al.  On location models for ubiquitous computing , 2004, Personal and Ubiquitous Computing.