A Method for Detecting Document Orientation by Using NaÏve Bayes Classifier

An approach for document orientation detection and classification using Naïve Bayes theorem is proposed in this paper. First, all the characters in a document image will be isolated and some valid ones are selected. Using the valid characters, the document image will be vectorized to a 32-dimensional vector. Gaussian distribution function is used to calculate the probability of each dimension, and then the posterior probabilities of the query document image in each class are also calculated. Finally, the orientation of document is detected as the class with the highest probability. Experimental results show the accuracy of the proposed method is considerably higher than Bray Curtis distance, even for some worse samples.

[1]  Robert S. Caprari Algorithm for text page up/down orientation determination , 2000, Pattern Recognit. Lett..

[2]  Shijian Lu,et al.  Automatic Detection of Document Script and Orientation , 2007 .

[3]  Ahmed Ghoneim,et al.  Naive Bayes Classifier based Arabic document categorization , 2010, 2010 The 7th International Conference on Informatics and Systems (INFOS).

[4]  Chew Lim Tan,et al.  Automatic document orientation detection and categorization through document vectorization , 2006, MM '06.

[5]  V.F. Fernandez,et al.  Naive Bayes Web Page Classification with HTML Mark-Up Enrichment , 2006, 2006 International Multi-Conference on Computing in the Global Information Technology - (ICCGI'06).