Towards Full Lexical Recognition

Text processing in Serbian is based on the Intex format system of electronic dictionaries. Although lexical recognition is successful for 75% to 90% of word forms (depending on the type of text), some categories of words remain unrecognized. In this paper we present two aspects of e-dictionary enhancement that provide for additional recognition of two important categories of words: named entities and words generally not recorded in traditional dictionaries. We first describe the structure and content of dictionaries of proper names, both personal and geographic, developed to recognize the corresponding classes of named entities. Then we present a set of lexical transducers expressing morphological rules governing word formation, developed for the recognition of unknown words. The resources presented significantly improve the lexical recognition process.