论文信息 - An analysis of sentence level text classification for the Kannada language

An analysis of sentence level text classification for the Kannada language

With the rapid growth of internet, huge amount of data is available online. The ability to draw useful information from this digital data is quite challenging. The task of exploring and extracting information from native languages available on line is very much a useful task. The work presented here focuses on sentence level classification in the Kannada language. The most popular approaches in text categorization like Naïve Bayesian and Bag of Words (BOW) approaches are used in this work. It is evident that Bag of Words approach performs significantly better than Naïve Bayesian approach. The objective of the work is to find how sentence level classification works for Kannada Language, as it can be extended further to sentiment classification, Question Answering, Text Summarization and also for customer reviews in Kannada Blogs, because most user's comments, queries, opinions etc are expressed using sentences, hence this sentence level Text Classification becomes a special task of Text Classification problem. The work though focuses on very basic approaches presently, can later be extended to other methods like SVM, KNN etc.

K. Srikanta Murthy | R. Jayashree

[1] Jun Zhao,et al. Adding Redundant Features for CRFs-based Sentence Sentiment Classification , 2008, EMNLP.

[2] Tianfang Yao,et al. Kernel-based Sentiment Classification for Chinese Sentence , 2007, Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007).

[3] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[4] Mark Craven,et al. Hierarchical Hidden Markov Models for Information Extraction , 2003, IJCAI.

[5] K. Raghuveer,et al. Text Categorization in Indian Languages using Machine Learning Approaches , 2007, IICAI.

[6] Joe Carthy,et al. Investigating Statistical Techniques for Sentence-Level Event Classification , 2008, COLING.

[7] Ralph Grishman,et al. Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[8] Francine Chen,et al. A trainable document summarizer , 1995, SIGIR '95.

[9] Xin Wang,et al. Chinese Sentence-Level Sentiment Classification Based on Fuzzy Sets , 2010, COLING.

[10] Daniel Marcu,et al. An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[11] Chu-Ren Huang,et al. Sentiment Classification and Polarity Shifting , 2010, COLING.

[12] Claire Cardie,et al. Multi-Level Structured Models for Document-Level Sentiment Classification , 2010, EMNLP.