Principles of dataspace systems

The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means of managing them in a convenient, integrated, or principled fashion. These challenges arise in enterprise and government data management, digital libraries, "smart" homes and personal information management. We have proposed dataspaces as a data management abstraction for these diverse applications and DataSpace Support Platforms (DSSPs) as systems that should be built to provide the required services over dataspaces. Unlike data integration systems, DSSPs do not require full semantic integration of the sources in order to provide useful services. This paper lays out specific technical challenges to realizing DSSPs and ties them to existing work in our field. We focus on query answering in DSSPs, the DSSP's ability to introspect on its content, and the use of human attention to enhance the semantic relationships in a dataspace.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[3]  Gösta Grahne,et al.  Dependency Satisfaction in Databases with Incomplete Information , 1984, VLDB.

[4]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[5]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[6]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[7]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[8]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[9]  Alon Y. Halevy,et al.  Speeding up Inferences Using Relevance Reasoning: A Formalism and Algorithms , 1997, Artif. Intell..

[10]  Shaul Dar,et al.  DTL's DataSpot: Database Exploration Using Plain Language , 1998, VLDB.

[11]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[12]  Lois M. L. Delcambre,et al.  Superimposed Information for the Internet , 1999, WebDB.

[13]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[14]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[15]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[16]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[17]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[18]  Lois M. L. Delcambre,et al.  Bundles in captivity: an application of superimposed information , 2001, Proceedings 17th International Conference on Data Engineering.

[19]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[20]  Kevin Chen-Chuan Chang,et al.  Statistical Schema Integration across the Deep Web , 2002 .

[21]  Anuradha Bhamidipaty,et al.  Interactive deduplication using active learning , 2002, KDD.

[22]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[23]  Lois M. L. Delcambre,et al.  Superimposed Schematics: Introducing E-R Structure for In-Situ Information Selections , 2002, ER.

[24]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[26]  AnHai Doan,et al.  Building Data Integration Systems via Mass Collaboration , 2003 .

[27]  Kevin Chen-Chuan Chang,et al.  Statistical schema matching across web query interfaces , 2003, SIGMOD '03.

[28]  Jennifer Widom,et al.  Lineage tracing for general data warehouse transformations , 2003, The VLDB Journal.

[29]  Gordon Bell,et al.  Living With a Lifetime Store , 2003 .

[30]  David R. Karger,et al.  Haystack: A Platform for Authoring End User Semantic Web Applications , 2003, WWW.

[31]  Lin Guo XRANK : Ranked Keyword Search over XML Documents , 2003 .

[32]  Lois M. L. Delcambre,et al.  Querying bi-level information , 2004, WebDB '04.

[33]  Jun Zhang,et al.  Simlarity Search for Web Services , 2004, VLDB.

[34]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[35]  Lois M. L. Delcambre,et al.  Putting Integrated Information in Context: Superimposing Conceptual Models with SPARCE , 2004, APCCM.

[36]  Zachary G. Ives,et al.  ORCHESTRA: Rapid, Collaborative Sharing of Dynamic Data , 2005, CIDR.

[37]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[38]  AnHai Doan,et al.  Corpus-based schema matching , 2005, 21st International Conference on Data Engineering (ICDE'05).

[39]  Renée J. Miller,et al.  ConQuer: efficient management of inconsistent databases , 2005, SIGMOD '05.

[40]  Wang Chiew Tan,et al.  An annotation management system for relational databases , 2004, The VLDB Journal.

[41]  Alon Y. Halevy,et al.  A Platform for Personal Information Management and Integration , 2005, CIDR.

[42]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[43]  Sihem Amer-Yahia,et al.  Structure and Content Scoring for XML , 2005, VLDB.

[44]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[45]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS '05.

[46]  Dan Suciu,et al.  Answering Queries from Statistics and Probabilistic Views , 2005, VLDB.

[47]  Zachary G. Ives,et al.  Reconciling while tolerating disagreement in collaborative data sharing , 2006, SIGMOD Conference.

[48]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[49]  John Grant,et al.  PRL: A probabilistic relational language , 2006, Machine Learning.

[50]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[51]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.