Canonical Forms of XML Schemas

This paper studies certain transformations of XML schemas, which are widely used in algorithms of the XML data management. In view of the fact that properties and functional characteristics of the XML documents considerably differ from those of data of other type, the solutions of a number of typical data management problems (such as the XML data validation, schema inference, and data translation to/from other models) for them are more complicated. The general idea of our approach to solving these problems is to transform the original structure (i.e., structural schema constraints) into another structure without loss of information about properties of the original data that are important for applications. The suggested technique has been successfully used in various algorithms for solving problems of this kind. In this paper, a systematic approach to solving these problems is discussed. Methods for reducing the XML schemas to several canonical forms are presented, and algorithms of solving the management problems for data satisfying schemas represented in the canonical forms are examined.