Modeling XML Content Explained

Over the years, a lot of course material has been developed to explain to undergraduate students the fundamentals of XML, and schema languages such as DTD and XMLSchema. Typically, the syntax of these languages is discussed and examples are given. How to find a schema for some XML content is often not covered by the material. As a result, students have problems to start with modeling a complex schema, many of their inferred XML schemas are too liberal, and some are even incorrect. In this paper we present a systematic approach for modeling XML content models based on rewriting regular expressions. A smallscale experiment has demonstrated that the quality of the models is improved, and that the approach helps students to begin modeling XML content.