Formal modeling of multistructured documents

The quantity of digital documents available is still growing. The various contexts of use of such documents need several kinds of descriptions of their contents and structures. Thus a same document can be described according to several concurrent structures. Designing models and tools to exploit these various kinds of structures simultaneously presents a real challenge. In this way we have built document repositories to achieve this aim. Indeed, we proposed fragmentation techniques to manage the various issues related to the management of multistructured documents (representation, storage, reconstruction, and management of concurrent structures). This paper is dedicated to the presentation of the formal model. We propose to describe with precision and concision the various concepts related to the multi-structured documents as well as rules related to the organization of these documents.

[1]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[2]  Emmanuel Bruno,et al.  MSXD: A Model and a Schema for Concurrent Structures Defined over the Same Textual Data , 2006, DEXA.

[3]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[4]  Mohamed Mbarki,et al.  Multimedia documents management in a multistructural context , 2007, RCIS.

[5]  Ricardo A. Baeza-Yates,et al.  A language for queries on structure and contents of textual databases , 1995, SIGIR '95.

[6]  Jacques Le Maitre Describing multistructured XML documents by means of delay nodes , 2006, DocEng '06.

[7]  Alex Dekhtyar,et al.  A Framework for Management of Concurrent XML Markup , 2003, ER.

[8]  Steven J. DeRose,et al.  Markup Overlap: A Review and a Horse , 2004, Extreme Markup Languages®.

[9]  Andreas Witt,et al.  Multiple hierarchies: new aspects of an old solution. Re-published , 2005 .

[10]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[11]  K. N. Dollman,et al.  - 1 , 1743 .

[12]  Guillaume Gravier,et al.  Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News , 2004, LREC.

[13]  Mourad Mechkour,et al.  A Multifacet Formal Image Model for Information Retrieval , 1995, MIRO.

[14]  Jérôme Farinas,et al.  Audio Indexing on the Web: a Preliminary Study of Some Audio Descriptors , 2003 .

[15]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[16]  M. Charhad,et al.  Semantic video content indexing and retrieval using conceptual graphs , 2004, Proceedings. 2004 International Conference on Information and Communication Technologies: From Theory to Applications, 2004..

[17]  Andreas Witt,et al.  Making CONCUR work , 2005, Extreme Markup Languages®.

[18]  Karim Djemal A multi-views repository for multi-structured documents , 2007, ICEIS.

[19]  Wendell Piez,et al.  The Layered Markup and Annotation Language (LMNL) , 2002, Extreme Markup Languages®.

[20]  C. M. Sperberg-McQueen,et al.  GODDAG: A Data Structure for Overlapping Hierarchies , 2000, DDEP/PODDP.

[21]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .