Modeling heterogeneous data in dataspace

Recently, dataspace has been proposed as a new architecture in the evolution of information integration. However, how to fulfill the vision of dataspace, i.e. pay-as-you-go integration, is still a suspensive issue, and the first challenge is data modeling. In this paper, we present a flexible data model called triple model which is similar to RDF but more suitable to dataspace applications. Information in arbitrary data model can be uniformly represented by triple model in dataspace. According to pre-defined rules, each information item would be decomposed into a bunch of structures called triple, and all triples are loosely connected to form a unified view without reconciling semantic heterogeneity upfront. Then users are enabled to query or search in a uniform way over all heterogeneous information sources. Further, when users demand, existing dataspace can be incrementally enhanced by tighter integrations.

[1]  Jeffrey Naughton,et al.  The case for a wide-table approach to manage sparse relational data sets , 2007, SIGMOD '07.

[2]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[3]  Donald Kossmann,et al.  iMeMex: Escapes from the Personal Information Jungle , 2005, VLDB.

[4]  Jeffrey F. Naughton,et al.  Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[6]  Arjohn Kampman,et al.  SeRQL: A Second Generation RDF Query Language , 2003 .

[7]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[8]  Alon Y. Halevy,et al.  Why Your Data Won’t Mix , 2005, ACM Queue.

[9]  Marcos Antonio,et al.  iMeMex: A Platform for Personal Dataspace Management , 2006 .

[10]  Jens Dittrich,et al.  From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System , 2007, BTW.

[11]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[12]  Jens Dittrich,et al.  iDM: a unified and versatile data model for personal dataspace management , 2006, VLDB.

[13]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[14]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[15]  Franca Garzotto,et al.  HDM—a model-based approach to hypertext application design , 1993, TOIS.

[16]  Nicholas J. Belkin,et al.  Personal information management in the present and future perfect: Reports from a special NSF-sponsored workshop , 2005, ASIST.

[17]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..