Text processing in Serbian is based on the Intex format system of electronic dictionaries. Although lexical recognition is successful for 75% to 90% of word forms (depending on the type of text), some categories of words remain unrecognized. In this paper we present two aspects of e-dictionary enhancement that provide for additional recognition of two important categories of words: named entities and words generally not recorded in traditional dictionaries. We first describe the structure and content of dictionaries of proper names, both personal and geographic, developed to recognize the corresponding classes of named entities. Then we present a set of lexical transducers expressing morphological rules governing word formation, developed for the recognition of unknown words. The resources presented significantly improve the lexical recognition process.
[1]
Duško Vitas,et al.
An Overview of Resources and Basic Tools for the Processing of Serbian Written Texts
,
2003
.
[2]
Max Silberztein,et al.
Dictionnaires électroniques et analyse automatique de textes : le système intex
,
1993
.
[3]
Karel Pala,et al.
Relations between Inflectional and Derivation Patterns
,
2003
.
[4]
Duško Vitas,et al.
Using Textual and Lexical Resources in Developing Serbian Wordnet
,
2004
.
[5]
Denis Maurel,et al.
Description of a Multilingual Database of Proper Names
,
2002,
PorTAL.
[6]
Saso Dzeroski,et al.
DEPARTMENT OF INTELLIGENT SYSTEMS
,
2019
.