Preparing children's writing database for automated processing

This paper describes the process of anonymizing a German, publicly available children’s corpus of digitized and scanned in spontaneously written texts from Grades 1-8. After reviewing the data collection process published previously, the method for anonymization of texts and meta data are described. A revised annotation set that was added to the existing transcription is defined. This annotation supports the spelling error analysis process while adding further annotation at the syntax level to allow for separate processing of these issues. Updates to statistics for the new version of the data are reported to give the reader an idea about research potential this version of the data may provide.