Algorithmic typology and going from known to similar unknown categories within and across languages

This paper introduces three algorithms for the extraction of lexical and grammatical markers in parallel texts. The starting point for all of them is that trigger distributions are used as semantic cues. Automatic processing chains apply the same procedures (so-called “procedural universals”) to directly comparable texts of all languages. The domain-internal distribution of markers is usually highly diverse cross-linguistically due to polymorphy (there are many markers instantiating the same domain, but which also expressother meanings at the same time). Polymorphy structures a domain into subdomains in cross-linguistically different ways, and this structure canbe used for the aggregation of markers into cross-linguistically recurrent marker types and for assessing the domain-specific similarity relationships between languages.