This paper shows a data model for transforming and assembling document information such as SGML or XML documents. The biggest advantage over other data models is that this data model simultaneously provides (1) powerful patterns and contextual conditions, and (2) schema transformation. Patterns and contextual conditions capture conditions on subordinates and those on superiors, siblings, subordinates of siblings, etc, respectively, and have been recognized as highly important mechanisms for identifying document components in the document processing community. Meanwhile, schema transformation has been, since the RDB, recognized as crucial in the database community. However, no data models have provided all three of patterns, contextual conditions, and schema transformation. This data model is based on the forest-regular language theory. A schema is a forest automaton and an instance is a nite set of forests (sequences of trees). Since the parse tree set of an extended-context free grammar is accepted by a forest automaton, this model is a generalization of Gonnet and Tompa's grammatical model. Patterns are captured as forest automatons; contextual conditions are pointed forest representations (a variation of Podelski's pointed tree representations). Controlled by patterns and contextual conditions, an operator creates an instance from an input instance and also creates a reasonably small schema from an input schema. Furthermore, the created schema is often minimally su cient; any forest permitted by it may be generated by some input instance.
[1]
Ricardo A. Baeza-Yates,et al.
Integrating contents and structure in text retrieval
,
1996,
SGMD.
[2]
Makoto Murata,et al.
Transformation of Documents and Schemas by Patterns and Contextual Conditions
,
1996,
PODP.
[3]
Frank Wm. Tompa,et al.
Shortening the OED: experience with a grammar-defined database
,
1992,
TOIS.
[4]
Serge Abiteboul,et al.
Foundations of Databases
,
1994
.
[5]
Serge Abiteboul,et al.
From structured documents to novel query facilities
,
1994,
SIGMOD '94.
[6]
Masako Takahashi,et al.
Generalizations of Regular Sets and Their Applicatin to a Study of Context-Free Languages
,
1975,
Inf. Control..
[7]
Andreas Podelski,et al.
A monoid approach to tree automata
,
1992,
Tree Automata and Languages.
[8]
Serge Abiteboul,et al.
Querying and Updating the File
,
1993,
VLDB.
[9]
David Maier,et al.
Readings in Object-Oriented Database Systems
,
1989
.
[10]
Gaston H. Gonnet,et al.
Mind Your Grammar: a New Approach to Modelling Text
,
1987,
VLDB.
[11]
Dirk Van Gucht,et al.
Concepts for Modeling and Querying List-Structured Data
,
1994,
Inf. Process. Manag..
[12]
Marc Gyssens,et al.
A grammar-based approach towards unifying hierarchical data models
,
1989,
SIGMOD '89.