SGML and XML Document Grammars and Exceptions

The Standard Generalized Markup Language (SGML) and the Extensible Markup Language (XML) allow users to define document-type definitions (DTDs), which are essentially extended context-free grammars expressed in a notation that is similar to extended Backus?Naur form. The right-hand side of a production, called a content model, is both an extended and a restricted regular expression. The semantics of content models for SGML DTDs can be modified by exceptions (XML does not allow exceptions). Inclusion exceptions allow named elements to appear anywhere within the content of a content model, and exclusion exceptions preclude named elements from appearing in the content of a content model. We give precise definitions of the semantics of exceptions, and prove that they do not increase the expressive power of SGML DTDs when we restrict DTDs according to accepted SGML practice. We prove the following results:1. Exceptions do not increase the expressive power of extended context-free grammars.2. For each DTD with exceptions, we can obtain a structurally equivalent extended context-free grammar.3. For each DTD with exceptions, we can construct a structurally equivalent DTD when we restrict the DTD to adhere to accepted SGML practice.4. Exceptions are a powerful shorthand notation?eliminating them may cause exponential growth in the size of an extended context-free grammar or of a DTD.

[1]  Eve Maler,et al.  Developing SGML DTDs: From Text to Model to Markup , 1995 .

[2]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[3]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..

[4]  Anne Brüggemann-Klein,et al.  Unambiguity of Extended Regular Expressions in SGML Document Grammars , 1993, ESA.

[5]  Derrick Wood,et al.  Theory of Computation: A Primer , 1987 .

[6]  Derick Wood,et al.  The validation of SGML content models , 1997 .

[7]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[8]  Brian E. Travis,et al.  The SGML implementation guide - a blueprint for SGML migration , 1995 .

[9]  Derick Wood,et al.  Extended Context-Free Grammars and Normal Form Algorithms , 1998, Workshop on Implementing Automata.

[10]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..

[11]  Jos Warmer,et al.  The Implementation of the Amsterdam SGML Parser , 1988, Electron. Publ..

[12]  Derick Wood,et al.  Theory of computation , 1986 .

[13]  Anne Brüggemann-Klein Regular Expressions into Finite Automata , 1993, Theor. Comput. Sci..

[14]  Derick Wood,et al.  SGML and Exceptions , 1996, PODP.

[15]  Pekka Kilpeläinen SGML & XML Content Models , 1999, Markup Lang..

[16]  Anne Br Uggemann-Klein Compiler-construction Tools and Techniques for Sgml Parsers: Diiculties and Solutions , 1994 .

[17]  Thomas Lengauer,et al.  Algorithms—ESA '93 , 1993, Lecture Notes in Computer Science.

[18]  Anne Brüggemann-Klein,et al.  Regular Expressions into Finite Automata , 1992, Theor. Comput. Sci..

[19]  Serge Abiteboul,et al.  From structured documents to novel query facilities , 1994, SIGMOD '94.

[20]  Pekka Kllpelälnen SGML & XML content models , 1999 .