Punjabi Text Classification using Naïve Bayes, Centroid and Hybrid Approach

Punjabi Text Classification is the process of assigning predefined classes to the unlabelled text documents. Because of dramatic increase in the amount of content available in digital form, text classification becomes an urgent need to manage the digital data efficiently and accurately. Till now no Punjabi Text Classifier is available for Punjabi Text Documents. Therefore, in this paper, existing classification algorithm such as Naive Bayes, Centroid Based techniques are used for Punjabi Text Classification. And one new approach is proposed for the Punjabi Text Documents which is the combination Naive Bayes (to extract the relevant features so as to reduce the dimensionality) and Ontology Based Classification (that act as text classifier that used extracted features). These algorithms are performed over 184 Punjabi News Articles on

[1]  Yanfang Ye,et al.  A New Centroid-Based Classifier for Text Categorization , 2008, 22nd International Conference on Advanced Information Networking and Applications - Workshops (aina workshops 2008).

[2]  Naushad UzZaman,et al.  Analysis of N-Gram based text categorization for Bangla in a newspaper , 2006 .

[3]  Houkuan Huang,et al.  Feature selection for text classification with Naïve Bayes , 2009, Expert Syst. Appl..

[4]  K. Rajan,et al.  Automatic classification of Tamil documents using vector space model and artificial neural network , 2009, Expert Syst. Appl..

[5]  Gurpreet Singh Lehal,et al.  A Survey of Text Mining Techniques and Applications , 2009 .

[6]  Abbas Raza Ali,et al.  Urdu text classification , 2009, FIT.

[7]  Vishal Gupta,et al.  A survey of Named Entity Recognition in English and other Indian Languages , 2010 .

[8]  P. K. Santi,et al.  Semantic Based Text Classification Using WordNets: Indian Language Perspective , 2006 .

[9]  Gurpreet Singh Lehal,et al.  Punjabi Language Stemmer for nouns and proper names , 2011 .

[10]  Christoph Goller,et al.  Automatic Document Classification - A thorough Evaluation of various Methods , 2000, ISI.

[11]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[12]  Guoshi Wu,et al.  Research on Text Classification Algorithm by Combining Statistical and Ontology Methods , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[13]  Gurpreet Singh Lehal,et al.  Preprocessing Phase of Punjabi Language Text Summarization , 2011, ICIS 2011.

[14]  Dino Isa,et al.  Automatically computed document dependent weighting factor facility for Naïve Bayes classification , 2010, Expert Syst. Appl..

[15]  K. Srikanta Murthy,et al.  An analysis of sentence level text classification for the Kannada language , 2011, 2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR).