A Conceptual Model for Acquisition of Morphological Features of Highly Agglutinative Tamil Language Using Unsupervised Approach

Construction of powerful computer systems to understand the human languages or natural languages to capture information about various domains demands morphologically featured modeled architected appropriately in a core way. Morphological analysis is a crucial step that plays a predominant role in the field of natural language processing. It includes the study of structure, formation, functional units of the words, identification of morphemes to endeavor the formulation of the rules of the language. Since natural language processing applications like machine translation systems, speech recognition, information retrieval rely on large text data to analyze using linguistic expertise is not viable. To overcome this issue morphological analysis using unsupervised settings is incorporated. It is an alternative procedure that works independently to uncover the morphological structure of the languages. This paper gives a theoretical model to analysis morphologically the structure of the Tamil language in an unsupervised way.

[1]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[2]  Jason Baldridge,et al.  Unsupervised morphological segmentation and clustering with document boundaries , 2009, EMNLP.

[3]  Hrafn Loftsson,et al.  Tagging Icelandic text: A linguistic rule-based approach , 2008, Nordic Journal of Linguistics.

[4]  Lauri Karttunen,et al.  Finite State Morphology , 2003, CSLI Studies in Computational Linguistics.

[5]  K P Soman,et al.  Amrita Morph Analyzer and Generator for Tamil: A Rule based Approach , 2009 .

[6]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[7]  M. Anand Kumar,et al.  A sequence labeling approach to morphological analyzer for Tamil language , 2010 .

[8]  Markus Dreyer,et al.  A non-parametric model for the discovery of inflectional paradigms from plain text using graphical models over strings , 2011 .

[9]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[10]  Ananthi Sheshasaayee Morpheme Segmentation for Highly Agglutinative Tamil Language by Means of Unsupervised Learning , 2015 .

[11]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984 .

[12]  Dayne Freitag,et al.  Morphology Induction from Term Clusters , 2005, CoNLL.

[13]  Krister Lindén,et al.  A Probabilistic Model for Guessing Base Forms of New Words by Analogy , 2008, CICLing.

[14]  Chris D. Paice,et al.  Another stemmer , 1990, SIGF.

[15]  Erwin Chan,et al.  Structures and distributions in morphology learning , 2008 .

[16]  Harald Hammarström A Naive Theory of Morphology and an Algorithm for Extraction , 2006, ACL 2006.