Improving Word Spotting System Performance using Ensemble Classifier Combination Methods

The effective retrieval of information from scanned handwritten documents is becoming essential with the increasing amounts of digitized documents. Therefore, developing efficient means of analyzing and recognizing these documents is of significant interest. Among these methods is word spotting, which has recently become an active research area. Different ensemble classifiers have been successfully proposed to improve the performance of a pattern recognition or a word spotting system. In this paper, we propose an enhanced internal structure of the Arabic handwritten word spotting hierarchical classifier. In addition, we propose two ensemble classifier combination methods to improve the performance of closed lexicon word spotting systems. These methods are, 1) the improved score word matching method, and 2) score evaluation method. Both methods calculate a new score by utilizing the confidence values (scores) given by the combined classifiers. Support Vector Machines (SVM) and Regularized Discriminant Analysis (RDA) have been utilized to implement the proposed ensemble classifier. The proposed methods have been tested using the CENPARMI Arabic handwritten documents database, and the results show that combining classifiers has a significant improvement on word spotting systems. The precision rate increased by 4% and 17% respectively, when the improved score matching method and the score evaluation method have been used.

[1]  Gernot A. Fink,et al.  Query-by-Online Word Spotting Revisited: Using CNNs for Cross-Domain Retrieval , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[2]  Volkmar Frinken,et al.  Improving HMM-Based Keyword Spotting with Character Language Models , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Cheng-Lin Liu,et al.  Preprocessing and statistical/structural feature extraction for handwritten numeral recognition , 1997 .

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Ching Y. Suen,et al.  Arabic Handwritten Text Line Extraction by Applying an Adaptive Mask to Morphological Dilation , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[6]  Venu Govindaraju,et al.  Online Handwritten Cursive Word Recognition by Combining Segmentation-Free and Segmentation-Based Methods , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[7]  Ching Y. Suen,et al.  Novel Handwritten Words and Documents Databases of Five Middle Eastern Languages , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[8]  Kaspar Riesen,et al.  Ensembles for Graph-Based Keyword Spotting in Historical Handwritten Documents , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[9]  Volkmar Frinken,et al.  Keyword spotting for self-training of BLSTM NN based handwriting recognition systems , 2014, Pattern Recognit..

[10]  Ching Y. Suen,et al.  A Novel Comprehensive Database for Arabic Off-Line Handwriting Recognition , 2008 .

[11]  Ernest Valveny,et al.  Segmentation-free word spotting with exemplar SVMs , 2014, Pattern Recognit..

[12]  Konstantinos Zagoris,et al.  An Adaptive Zoning Technique for Word Spotting Using Dynamic Time Warping , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[13]  Ching Y. Suen,et al.  Verification of Hierarchical Classifier Results for Handwritten Arabic Word Spotting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[14]  Salvador España Boquera,et al.  Improving Isolated Handwritten Word Recognition Using a Specialized Classifier for Short Words , 2009, CAEPIA.

[15]  Cheng-Lin Liu,et al.  Classifier combination based on confidence transformation , 2005, Pattern Recognit..