iDM: a unified and versatile data model for personal dataspace management

Personal Information Management Systems require a powerful and versatile data model that is able to represent a highly heterogeneous mix of data such as relational data, XML, file content, folder hierarchies, emails and email attachments, data streams, RSS feeds and dynamically computed documents, e.g. ActiveXML [3]. Interestingly, until now no approach was proposed that is able to represent all of the above data in a single, powerful yet simple data model. This paper fills this gap. We present the iMeMex Data Model (iDM) for personal information management. iDM is able to represent unstructured, semi-structured and structured data inside a single model. Moreover, iDM is powerful enough to represent graph-structured data, intensional data as well as infinite data streams. Further, our model enables to represent the structural information available inside files. As a consequence, the artifical boundary between inside and outside a file is removed to enable a new class of queries. As iDM allows the representation of the whole personal dataspace [20] of a user in a single model, it is the foundation of the iMeMex Personal Dataspace Management System (PDSMS) [16, 14, 47]. This paper also presents results of an evaluation of an initial iDM implementation in iMeMex that show that iDM can be efficiently supported in a real PDSMS.

[1]  Ben Shneiderman,et al.  Response time and display rate in human performance with computers , 1984, CSUR.

[2]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[3]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[4]  Peter M. Schwarz,et al.  The Rufus System: Information Organization for Semi-Structured Data , 1993, VLDB.

[5]  Craig A. Knoblock,et al.  Retrieving and Integrating Data from Multiple Information Sources , 1993, Int. J. Cooperative Inf. Syst..

[6]  Dragutin Petkovic,et al.  The query by image content (QBIC) system , 1995, SIGMOD '95.

[7]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[8]  David Gelernter,et al.  Lifestreams: a storage model for personal data , 1996, SGMD.

[9]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[10]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[11]  Stanley B. Zdonik,et al.  “Data in your face”: push technology in perspective , 1998, SIGMOD '98.

[12]  Serge Abiteboul,et al.  On views and XML , 1999, PODS '99.

[13]  Douglas K. Barry,et al.  The Object Data Standard: ODMG 3.0 , 2000 .

[14]  F. E.,et al.  A Relational Model of Data Large Shared Data Banks , 2000 .

[15]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[16]  Ioana Manolescu,et al.  Dynamic XML documents with distribution and replication , 2003, SIGMOD '03.

[17]  Oren Etzioni,et al.  Crossing the Structure Chasm , 2003, CIDR.

[18]  Andrew Trotman,et al.  Narrowed Extended XPath I (NEXI) , 2004, INEX.

[19]  Tom M. Mitchell,et al.  Inferring Ongoing Activities of Workstation Users by Clustering Email , 2004, CEAS.

[20]  Andrew McCallum,et al.  Extracting social networks and contact information from email and the Web , 2004, CEAS.

[21]  Tom M. Mitchell,et al.  Learning to Classify Email into “Speech Acts” , 2004, EMNLP.

[22]  Laks V. S. Lakshmanan,et al.  Colorful XML: one hierarchy isn't enough , 2004, SIGMOD '04.

[23]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[24]  Ning Li,et al.  Hubble: An Advanced Dynamic Folder Technology for XML , 2005, VLDB.

[25]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[26]  Alon Y. Halevy,et al.  A Platform for Personal Information Management and Integration , 2005, CIDR.

[27]  Andrew McCallum,et al.  The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email , 2005 .

[28]  Gerhard Weikum,et al.  The SphereSearch Engine for Unified Ranked Retrieval of Heterogeneous XML and Web Documents , 2005, VLDB.

[29]  Donald Kossmann,et al.  iMeMex: Escapes from the Personal Information Jungle , 2005, VLDB.

[30]  Donald Kossmann,et al.  AGILE: adaptive indexing for context-aware information filters , 2005, SIGMOD '05.

[31]  Serge Abiteboul,et al.  Exchanging intensional XML data , 2003, TODS.

[32]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[33]  David R. Karger,et al.  Haystack: A General-Purpose Information Management Tool for End Users Based on Semistructured Data , 2005, CIDR.

[34]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[35]  Marcos Antonio,et al.  iMeMex: A Platform for Personal Dataspace Management , 2006 .

[36]  Susan T. Dumais,et al.  Fast, flexible filtering with phlat , 2006, CHI.

[37]  Susan T. Dumais,et al.  Fast, Flexible Filtering with Phlat — Personal Search and Organization Made Easy , 2006 .