Half-steps toward LMNL

Overlap in markup occurs where some markup structures do not nest, such as where the sentence and phrase boundaries of a poem and the metrical line structure describe different hierarchies. LMNL (Layered Markup and Annotation Language) is a model for representing textual data, designed to recognize and account for layer separation and markup overlap. LMNL is specified as a data model, not as a syntax — but without a syntax and an API it’s very difficult to experiment with the model. I demonstrate a subset of LMNL using an XML syntax and some severe restrictions on LMNL (thus “half-LMNL”). Using an attribute structure for milestone marking and correspondence allows the input to be processed as XML and parsed into a tree. If this tree is flattened to reduce all XML markup to empty XML elements demarcating fragments of text, it can be transformed again to produce a modified “reified LMNL” model including overlapping ranges. This XML representation of a LMNL model takes the form, in effect, of standoff markup, although the technique preserves tag ordering (as full LMNL would not). Half-steps toward LMNL

[1]  Steven J. DeRose,et al.  Markup Overlap: A Review and a Horse , 2004, Extreme Markup Languages®.