The Database of Estonian Word Families: a Language Technology Resource

The paper describes a polyfunctional database of Estonian word families which is based on extensive research and contains detailed word formation information about the Estonian vocabulary. It is an XML database integrated into a dictionary management system which offers various possibilities of structure based editing and searching, data reuse etc. The design of the database is based on the word families method, which consists in the organization of words on the basis of common stem morphemes and word formation relations. Until now, the word families method has been used in the compilation of word formation dictionaries. Using the method in the compilation of a database is a novel solution which considerably broadens the access to and the possible uses of word formation data. The database provides material for researchers in computational and general linguistics, language learners and teachers, and lexicographers. The data can also be used in several language technology applications like search engines, text-to-speech synthesis etc. The study was supported by the National Programme for Estonian Language Technology and by the project SF0050023s09 “Modeling intermodular phenomena in Estonian”.