Constraints for corpora development and validation

In this paper we consider corpora as a set of XML documents. The guidelines for the creation of the corpora determine the semantics of the data, stored in them. Usually the guidelines prescribe the actual structure of the corpora, the used symbols, their meaning and the relations among them. Ideally, the software supporting the creation of a corpus has to allow all the constraints that follow from the guidelines to be imposed over the XML representation of the corpus. To the best of our knowledge, such software does not exist yet. The main problems come from the complexity of the data in the corpus and the impossibility it to be to completely formalized.