Bag of features approach for offline text-independent Chinese writer identification

This paper studies offline text-independent writer identification of Chinese handwriting. The Bag of Features method is adopted for Chinese writer identification and performs much better than previous state-of-the-art methods. The feature adopted is scale invariant transform feature (SIFT) descriptor for it can extract local directional information from Chinese characters. Instead of Hard Voting, we use two newly devised coding strategies: Improved Fisher Kernels and Locality-constrained Linear Coding, to encode each SIFT descriptor. To make these coding strategies suitable to this new application area, absolute average pooling function is utilized. At last the K-nearest-neighbor classifier is used to identify the author of a handwriting image. Experimental results are conducted on a newly collected dataset of Chinese handwriting, CASIA Offline DB 2.1. Experimental results show our approach not only outperforms previous state-of-the-art methods, but also the traditional Bag of Word method using Hard Voting.

[1]  Tieniu Tan,et al.  Biometric personal identification based on handwriting , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[2]  Yuan Yan Tang,et al.  A novel method for offline handwriting-based writer identification , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[5]  Lambert Schomaker,et al.  Automatic writer identification using connected-component contours and edge-based features of uppercase Western script , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Bin Fang,et al.  Fragmented edge structure coding for Chinese writer identification , 2012, Neurocomputing.

[7]  Zhenyu He,et al.  Writer identification of Chinese handwriting documents using hidden Markov tree model , 2008, Pattern Recognit..

[8]  Lucas Ballard,et al.  Evaluating the Security of Handwriting Biometrics , 2006 .

[9]  Yuan Yan Tang,et al.  Wavelet Domain Local Binary Pattern Features For Writer Identification , 2010, 2010 20th International Conference on Pattern Recognition.

[10]  Tianwen Zhang,et al.  Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[11]  P. S. Hiremath,et al.  Wavelet based co-occurrence histogram features for texture classification with an application to script identification in a document image , 2008, Pattern Recognit. Lett..

[12]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[13]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[15]  Xin Li,et al.  An Improved Method Based on Weighted Grid Micro-structure Feature for Text-Independent Writer Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[16]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Xin Li,et al.  Writer Identification of Chinese Handwriting Using Grid Microstructure Feature , 2009, ICB.

[18]  Lambert Schomaker,et al.  Text-Independent Writer Identification and Verification Using Textural and Allographic Features , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Robert Sablatnig,et al.  Writer Retrieval and Writer Identification Using Local Features , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[20]  Lambert Schomaker,et al.  Using codebooks of fragmented connected-component contours in forensic and historic writer identification , 2007, Pattern Recognit. Lett..