DSD: A schema language for XML

XML (eXtensible Markup Language) is a linear syntax for trees, which has gathered a remarkable amount of interest in industry. The acceptance of XML opens new venues for the application of formal methods such as specification of abstract syntax tree sets and tree transformations. A notation for defining a set of XML trees is called a schema language. Such trees correspond to a specific user domain, such as XHTML, the class of XML documents that make sense as HTML. A useful schema notation must: identify most of the syntactic requirements that the documents in the user domain follow; allow efficient parsing; be readable to the user; allow limited tree transformations corresponding to the insertion of defaults; be modular and extensible to support evolving classes of XML documents.par>In the present paper, we introduce the DSD (Document Structure Description) notation as our bid on how to meet the requirements above.