Automatic Collection and Analysis of GermanCompounds

In this paper we report on an exploration of noun-noun compounds in a large German corpus. The morphological parsing providing the analysis of words into stems and suffixes was entirely data-driven, in that no knowledge of Ge:man was used to determine what the correct set of stems and suffixes was, nor how to break any given word into its component morphemes. To discover compounds, however, we used our prior knowledge of the structure of German nominal compounds, in a way that we will describe in greater length below. The interest of this case derives from the fact that German compounds (unlike English compounds, but like those in many other languages, especially in the Indo-European family) include a linking element (Fugenelement in German) placed between the two stems. Traditional grammars report nine possible linker elements: e, es, en, er, n, ens, ns, s, and zero (see Duden 1995), and report as well that the Left Element determines which choice of linking element is appropriate for a given nominal compound.'