Morphological analysis is an essential component in Natural Language Processing (NLP) applications ranging from spell checker to machine translation. When performing a morphological analysis it leads to segmentation of a word into morphemes, combined with an analysis of the attachments of these morphemes. In English language the complexity of the formation of words is not much higher compared with Indic languages. Hence, Tamil language too does have its complexities when building up a NLP application. The morphemes in the language, the rules how these morphemes are connected and the changes occur when they attach together are the important factors that need to be considered when building up a Morphological Analyzer for any language. Our “Morphological Analyzer and Generator for Tamil Language” will be generating the word forms of a stem/ root, given a particular context and at the same time, a surface form in Tamil language should get analyzed into its proper context. This model tries to cover only the nouns and verbs in the Tamil language. This paper illustrates how the lexicon and the orthographic rules of Tamil language have been written as regular expressions using only finite state operations and how this approach has been implemented to develop a morphological analyzer/generator. This model is built using the Xerox toolkit, which uses “Two-level Morphology”, and almost 2000 noun stems and 96 verb stems have been incorporated into the network. A noun stem now produces about 40 different forms and a verb stem produces up to 240 forms. We have also defined our own transliteration scheme for this purpose.
[1]
Kenneth R. Beesley,et al.
Arabic Morphology Using Only Finite-State Operations
,
1998,
SEMITIC@COLING.
[2]
Harri Jäppinen,et al.
Knowledge Engineering Approach To Morphological Analysis
,
1983,
EACL.
[3]
M. Anand Kumar,et al.
A Novel Approach to Morphological Analysis for Tamil Language
,
2009
.
[4]
Yuji Matsumoto,et al.
Language Independent Morphological Analysis
,
2000,
ANLP.
[5]
Lauri Karttunen.
Applications of Finite-State Transducers in Natural Language Processing
,
2000,
CIAA.
[6]
Miriam Butt,et al.
Developing a finite-state morphological anlayzer for Urdu and Hindi
,
2007
.
[7]
Shuly Wintner,et al.
A Finite-State Morphological Grammar of Hebrew
,
2005,
ACL 2005.
[8]
Kimmo Koskenniemi,et al.
A General Computational Model for Word-Form Recognition and Production
,
1984
.