Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization
暂无分享,去创建一个
This paper deals with the automatic construction of a lemmatizer from a Full Form – Lemma (FFL) training dictionary and with lemmatization of new, in the FFL dictionary unseen, i.e. out-of-vocabulary (OOV) words. Three methods of lemmatization of three kinds of OOV words (missing full forms, unknown words, and compound words) are introduced. These methods were tested on Czech test data. The best result (recall: 99.3 % and precision: 75.1 %) has been achieved by a combination of these methods. The lexicon-free lemmatizer based on the method of lemmatization of unknown words (lemmatization patterns method) is introduced too.
[1] Christof Monz,et al. From document retrieval to question answering , 2003 .
[2] Tanja Gaustad,et al. Linguistic knowledge and word sense disambiguation , 2004 .
[3] Ludek Müller,et al. Using the Lemmatization Technique for Phonetic Transcription in Text-to-Speech System , 2004, TSD.
[4] Robert Krovetz,et al. Viewing morphology as an inference process , 1993, Artif. Intell..