Describing and querying hierarchical XML structures defined over the same textual data

Our work aims at representing and querying hierarchical XML structures defined over the same textual data. We call such data "multistructured textual documents".Our objectives are twofold. First, we shall define a suitable - XML compatible - data model enabling (1) to describe several independent hierarchical structures over the same textual data (represented by several XML structured documents) (2) to consider user annotations added in each structured document. Our proposal is based on the use of hedges (the foundation of the grammar language RelaxNG). Secondly, we shall propose an extension of XQuery in order to query structures and content in a concurrent way. We shall apply our proposals using a literary text written in old French.

[1]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[2]  C. M. Sperberg-McQueen,et al.  GODDAG: A Data Structure for Overlapping Hierarchies , 2000, DDEP/PODDP.

[3]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[4]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[5]  Makoto Murata,et al.  Hedge automata: a formal model for xml schemata , 1999 .

[6]  Alex Dekhtyar,et al.  A framework for management of concurrent XML markup , 2005, Data Knowl. Eng..

[7]  Steven J. DeRose,et al.  Markup Overlap: A Review and a Horse , 2004, Extreme Markup Languages®.

[8]  Laks V. S. Lakshmanan,et al.  Colorful XML: one hierarchy isn't enough , 2004, SIGMOD '04.

[9]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[10]  Alex Dekhtyar,et al.  Towards a Query Language for Multihierarchical XML: Revisiting XPath , 2005, WebDB.

[11]  Andreas Witt,et al.  Multiple hierarchies: new aspects of an old solution. Re-published , 2005 .

[12]  Wendell Piez,et al.  The Layered Markup and Annotation Language (LMNL) , 2002, Extreme Markup Languages®.

[13]  Arvind Malhotra,et al.  Xml schema part 2: datatypes , 1999 .

[14]  James F. Allen Time and time again: The many ways to represent time , 1991, Int. J. Intell. Syst..

[15]  Emmanuel Bruno,et al.  MSXD: A Model and a Schema for Concurrent Structures Defined over the Same Textual Data , 2006, DEXA.

[16]  J. Clark,et al.  RELAX NG specification , 2001 .

[17]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[18]  Andreas Witt,et al.  Making CONCUR work , 2005, Extreme Markup Languages®.