Recognition of newspaper printed in Gurumukhi script

In this work, a system for recognition of newspaper printed in Gurumukhi script is presented. Four feature extraction techniques, namely, zoning features, diagonal features, parabola curve fitting based features, and power curve fitting based features are considered for extracting the statistical properties of the characters printed in the newspaper. Different combinations of these features are also applied to improve the recognition accuracy. For recognition, four classification techniques, namely, k-NN, linear-SVM, decision tree, and random forest are used. A database for the experiments is collected from three major Gurumukhi script newspapers which are Ajit, Jagbani and Punjabi Tribune. Using 5-fold cross validation and random forest classifier, a recognition accuracy of 96.19% with a combination of zoning features, diagonal features and parabola curve fitting based features has been reported. A recognition accuracy of 95.21% with a partitioning strategy of data set (70% data as training data and remaining 30% data as testing data) has been achieved.