Data Mining: Document Classification using Naive Bayes Classifier

In data mining, classification is the way to splits the data into several dependent and independent regions and each region refer as a class. There are different kinds of classifier uses to accomplish classification task. Moreover classification is bounded in case of classifying of text documents. The motives of the work which a present in the article is to evaluate multiclass document classification and to learn achieve accuracy of classification in the case of text documents. Naive Bayes approach is used to deal with the problem of document classification via a deceptively simplistic model. The Naive Bayes approach is applied in Flat (linear) and hierarchical manner for improving the efficiency of classification model. It has been found that Hierarchical Classification technique is more effective than Flat classification. It also performs better in case of multi-label document classification. In contrast to retrospect we observe significant increase in the generation of data each day. And hence with the advent of smarter technologies, data is required to be classified and sorted before framing out decisions from it. There are so many techniques available for classifying documents into various categories or labels. Data mining is the process of non-trivial extraction of novel, implicit, and actionable knowledge from large data sets.

[1]  Vivek Agarwal,et al.  Survey on Classification Techniques for Data Mining , 2015 .

[2]  Sang-Bum Kim,et al.  Effective Methods for Improving Naive Bayes Text Classifiers , 2002, PRICAI.

[3]  David M. Pennock,et al.  Statistical relational learning for document mining , 2003, Third IEEE International Conference on Data Mining.

[4]  Bhawna Nigam,et al.  Performance Evaluation of PSVM Using Various Combination of Kernel Function for Intrusion Detection System , 2012 .

[5]  A. Nigam,et al.  Classifying the bugs using multi-class semi supervised support vector machine , 2012, International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012).

[6]  Bhawna Nigam,et al.  Categorizing the Document Using Multi Class Classification in Data Mining , 2011, 2011 International Conference on Computational Intelligence and Communication Networks.

[7]  Alberto Ochoa,et al.  Data Mining in Web Applications , 2009 .

[8]  R. Senkamalavalli,et al.  Data mining techniques for CRM , 2014, International Conference on Information Communication and Embedded Systems (ICICES2014).

[9]  Bhawna Nigam,et al.  Document Classification Using Expectation Maximization with Semi Supervised Learning , 2011, ArXiv.

[10]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[11]  N. Satyanarayana,et al.  Survey of Classification Techniques in Data Mining , 2014 .