Layering and Merging Linguistic Annotations

The American National Corpus and its annotations are represented in a stand-off XML format compliant with the specifications of ISO TC37 SC4 WG1's Linguistic Annotation Framework. Because few systems that enable search and access of the corpus currently support stand-off markup, the project has developed a SAX like parser that generates ANC data with annotations in-line, in a variety of output formats.

[1]  Steven J. DeRose,et al.  Markup Overlap: A Review and a Horse , 2004, Extreme Markup Languages®.

[2]  Laurent Romary,et al.  International standard for a linguistic annotation framework , 2003, HLT-NAACL 2003.