Compiler-construction Tools and Techniques for Sgml Parsers: Diiculties and Solutions

The Standard Generalized Markup Language (SGML) is used to represent documents in an application-independent manner. In a recent paper, Nordin et al. analyze concisely which properties of the SGML language are hindering its more widespread use and acceptance. In particular, they identify a number of features in the SGML standard that make it diicult to apply commonly used implementation tools and techniques to build an SGML parser. One feature, however, or rather one combination of two features, escapes their notice. Unambiguity and the & operator were both intended to make SGML document grammars easier to read by humans. It is questionable, though, whether this goal is really achieved. At least, the combination of unambiguity and the & operator raises unforeseen problems in validating the grammars and in parsing the documents by machines. I am describing these problems here in detail. On the basis of this analysis, the standards committees that are currently revising the standard can make an informed decision on the future of the two features.