A Unified Approach to Structured, Semistructured and Unstructured Data

At the present time, the way in which we manage data depends on its structural features. In this report we propose a logical model and algebra which represents a step further in the process of bridging the gap between different data modeling approaches. In particular, the focus is on structured and semistructured data. Our model is based on set theory, as in the relational context, and on data graphs, logical data structures which are simple, mathematically defined, and very expressive. Our approach is parametric and flexible enough to adapt to heterogeneous application contexts, by simply tuning its parameters. The algebra is composed of a small and expressive set of operators. It deals with large collections of data instances, and it is orthogonal to the language used to internally manipulate them. In this way, we clearly distinguish between two levels of data manipulation: the internal one, for navigation and modification of data graphs, and the external one, for manipulation of sets of instances. 1. Department of Computer Science, University of Bologna, Mura Anteo Zamboni 7, 40127 Bologna, Italy 2. Department of Mathematics and Informatics, University of Camerino, Via Madonna delle Carceri 9, 62032 Camerino MC, Italy

[1]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[2]  Flavius Frasincar,et al.  XAL: An Algebra For XML Query Optimization , 2002, Australasian Database Conference.

[3]  N. Fuhr PAN-Uncovering Plagiarism , Authorship , and Social Software Misuse ImageCLEF 2013-Cross Language Image Annotation and Retrieval INEX-INitiative for the Evaluation of XML retrieval , 2002 .

[4]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[5]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[6]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[7]  敏嗣 弓場,et al.  20世紀の名著名論:E. F. Codd : A Relational Model of Data for Large Shared Data Banks , 2003 .

[8]  James Clark,et al.  XSL Transformations (XSLT) Version 1.0 , 1999 .

[9]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[10]  Gerhard Weikum,et al.  Adding Relevance to XML , 2000, WebDB.

[11]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[12]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[13]  Jim Melton,et al.  SQL/XML is making good progress , 2002, SGMD.

[14]  Michael Gertz,et al.  XQuery/IR: Integrating XML Document and Data Retrieval , 2002, WebDB.

[15]  Dan Suciu,et al.  Semistructured Data and XML , 2001, FODO.

[16]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[17]  Antonio Albano,et al.  Yet another query algebra for XML data , 2002, Proceedings International Database Engineering and Applications Symposium.

[18]  Norbert Fuhr,et al.  A Query Language and User Interface for XML Information Retrieval , 2003, Intelligent Search on XML Data.

[19]  Gerhard Weikum,et al.  The XXL search engine: ranked retrieval of XML data using indexes and ontologies , 2002, SIGMOD '02.

[20]  Awais Rashid,et al.  XML Data Management: Native XML and XML-Enabled Database Systems , 2003 .

[21]  Nicholas Kushmerick,et al.  Expressive retrieval from XML documents , 2001, SIGIR '01.

[22]  Arvind Malhotra,et al.  Xml schema part 2: datatypes , 1999 .

[23]  Serge Abiteboul,et al.  Foundations of Databases: The Logical Level , 1995 .

[24]  Gabriella Kazai,et al.  The INEX Evaluation Initiative , 2003, Intelligent Search on XML Data.