Sentence level text classification in the Kannada language - a classifier's perspective

Better information retrieval techniques are needed to address the problem of information explosion. Major portion of data available online is text, which gives rise to huge feature space, hence, structured organisation and retrieval is very important. Information retrieval in the context of Indian languages is not uncommon, but IR in the South Indian language Kannada is quite new. This work focuses on sentence level text classification in the Kannada language, which is a fine grained approach to text classification; here, we look at the suitability of classifiers such as naive Bayesian, bag of words and support vector machine SVM for the same. The dimensionality reduction technique using two different approaches: minimum term frequency and stop word removal methods are carried out in this work and the performance analysis of the above mentioned classifiers are noted.

[1]  Hasan Davulcu,et al.  Semantic classification and dependency parsing enabled automated bio-molecular event extraction from text , 2010, BCB '10.

[2]  Tianfang Yao,et al.  A Kernel-Based Sentiment Classification Approach for Chinese Sentences , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[3]  Claire Cardie,et al.  Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora , 2011, ACL.

[4]  Joe Carthy,et al.  Investigating Statistical Techniques for Sentence-Level Event Classification , 2008, COLING.

[5]  K. Raghuveer,et al.  Text Categorization in Indian Languages using Machine Learning Approaches , 2007, IICAI.

[6]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[7]  Xin Wang,et al.  Chinese Sentence-Level Sentiment Classification Based on Fuzzy Sets , 2010, COLING.

[8]  Elena Cotos,et al.  Automatic Identification of Discourse Moves in Scientific Article Introductions , 2008 .

[9]  Claire Cardie,et al.  Multi-Level Structured Models for Document-Level Sentiment Classification , 2010, EMNLP.

[10]  Jun Zhao,et al.  Adding Redundant Features for CRFs-based Sentence Sentiment Classification , 2008, EMNLP.

[11]  Chu-Ren Huang,et al.  Sentiment Classification and Polarity Shifting , 2010, COLING.

[12]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.