Encoding Pathlet and SIFT Features With Bagged VLAD for Historical Writer Identification

Offline writer identification plays an important role in forensic document examination and historical document analysis. Today, challenges still exist in historical writer identification (WI), where documents may present very complex handwriting styles. In this paper, we propose novel techniques for a detailed description and accurate identification of handwriting in historical documents. Because handwriting contours are one of the most salient components to characterize one’s handwriting style, a novel pathlet feature is proposed to describe their rich properties beyond slant and curvature in a principled way; these properties can be exploited in a VLAD-like encoding framework for fine-grained handwriting description. Besides the pathlet feature, we extract unidirectional SIFT feature to describe handwriting corners and junctions. To effectively encode the pathlet and SIFT features, a novel encoding method, named bagged VLAD, is further proposed to address the problem that a large codebook sparsely spreads out the data points and leads to a degraded performance, allowing a much larger codebook for improved encoding performance. Our proposed method achieves state-of-the-art performance on ICDAR2017 Historical-WI database and ICDAR2019 HDRC-IR database, and has won the first place in ICDAR2019 HDRC-IR competition.

[1]  Kuo-Tsai Ciren,et al.  INTEGRATION OF PATHS, GEOMETRIC INVARIANTS AND A GENERALIZED BAKER-HAUSDORFF FORMULA , 2016 .

[2]  Partha Pratim Roy,et al.  Writer identification using texture features: A comparative study , 2018, Comput. Electr. Eng..

[3]  Basilios Gatos,et al.  ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[4]  Imran Siddiqi,et al.  Writer identification using texture descriptors of handwritten fragments , 2016, Expert Syst. Appl..

[5]  Shijian Lu,et al.  Binarization of historical document images using the local maximum and minimum , 2010, DAS '10.

[6]  Lambert Schomaker,et al.  Beyond OCR: Multi-faceted understanding of handwritten document characteristics , 2017, Pattern Recognit..

[7]  Robert Sablatnig,et al.  CVL-DataBase: An Off-Line Database for Writer Retrieval, Writer Identification and Word Spotting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[8]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Fouad Khelifi,et al.  Robust off-line text independent writer identification using bagged discrete cosine transform features , 2017, Expert Syst. Appl..

[10]  Ernest Valveny,et al.  Large-scale document image retrieval and classification with runlength histograms and binary embeddings , 2013, Pattern Recognit..

[11]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[12]  Lianwen Jin,et al.  Recurrent Adaptation Networks for Online Signature Verification , 2019, IEEE Transactions on Information Forensics and Security.

[13]  Lambert Schomaker,et al.  Writer identification using curvature-free features , 2017, Pattern Recognit..

[14]  Youbao Tang,et al.  Text-Independent Writer Identification via CNN Features and Joint Bayesian , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[15]  Lianwen Jin,et al.  Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Ambalika Sharma,et al.  DCWI: Distribution descriptive curve and Cellular automata based Writer Identification , 2019, Expert Syst. Appl..

[17]  Suresh Sundaram,et al.  Online Writer Identification With Sparse Coding-Based Descriptors , 2018, IEEE Transactions on Information Forensics and Security.

[18]  Chin-Teng Lin,et al.  Semi-supervised feature learning for improving writer identification , 2019, Inf. Sci..

[19]  Fouad Khelifi,et al.  Dissimilarity Gaussian Mixture Models for Efficient Offline Handwritten Text-Independent Identification Using SIFT and RootSIFT Descriptors , 2019, IEEE Transactions on Information Forensics and Security.

[20]  Terry Lyons,et al.  Uniqueness for the signature of a path of bounded variation and the reduced path group , 2005, math/0507536.

[21]  Lambert Schomaker,et al.  Junction detection in handwritten documents and its application to writer identification , 2015, Pattern Recognit..

[22]  Lambert Schomaker,et al.  Automatic writer identification using connected-component contours and edge-based features of uppercase Western script , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Elli Angelopoulou,et al.  Writer identification using VLAD encoded contour-Zernike moments , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[24]  Free Lie algebras,et al.  Free Lie algebras , 2015 .

[25]  Yassine Ruichek,et al.  An effective and conceptually simple feature representation for off-line text-independent writer identification , 2019, Expert Syst. Appl..

[26]  Abdeljalil Gattal,et al.  Writer Identification on Historical Documents using Oriented Basic Image Features , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[27]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Andreas K. Maier,et al.  Writer Identification Using GMM Supervectors and Exemplar-SVMs , 2017, Pattern Recognit..

[29]  Basilios Gatos,et al.  ICFHR 2012 Competition on Writer Identification Challenge 1: Latin/Greek Documents , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[30]  Volker Märgner,et al.  Writer Identification for Historical Manuscripts: Analysis and Optimisation of a Classifier as an Easy-to-Use Tool for Scholars from the Humanities , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[31]  Luiz Eduardo Soares de Oliveira,et al.  Texture-based descriptors for writer identification and verification , 2013, Expert Syst. Appl..

[32]  Lazaros T. Tsochatzidis,et al.  ICDAR 2019 Competition on Document Image Binarization (DIBCO 2019) , 2017, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[33]  Andreas Maier,et al.  ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[34]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[35]  Réjean Plamondon,et al.  Automatic signature verification and writer identification - the state of the art , 1989, Pattern Recognit..

[36]  Lianwen Jin,et al.  Offline Writer Identification Based on the Path Signature Feature , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[37]  Mohsen Ebrahimi Moghaddam,et al.  A text-independent Persian writer identification based on feature relation graph (FRG) , 2010, Pattern Recognit..

[38]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[39]  Muhammad Imran Razzak,et al.  Writer identification using machine learning approaches: a comprehensive review , 2018, Multimedia Tools and Applications.

[40]  Zhenyu He,et al.  Writer identification using fractal dimension of wavelet subbands in gabor domain , 2010, Integr. Comput. Aided Eng..

[41]  Louis Vuurpijl,et al.  Forensic writer identification: a benchmark data set and a comparison of two systems , 2000 .

[42]  Labiba Souici-Meslati,et al.  Text-independent writer recognition using multi-script handwritten texts , 2013, Pattern Recognit. Lett..

[43]  Lianwen Jin,et al.  DeepWriterID: An End-to-End Online Text-Independent Writer Identification System , 2015, IEEE Intelligent Systems.

[44]  Andreas K. Maier,et al.  Offline Writer Identification Using Convolutional Neural Network Activation Features , 2015, GCPR.

[45]  Andreas K. Maier,et al.  Unsupervised Feature Learning for Writer Identification and Writer Retrieval , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[46]  Lambert Schomaker,et al.  Deep Adaptive Learning for Writer Identification based on Single Handwritten Word Images , 2018, Pattern Recognit..

[47]  Yu-Jie Xiong,et al.  Off-line Text-Independent Writer Recognition: A Survey , 2017, Int. J. Pattern Recognit. Artif. Intell..

[48]  Andrey Kormilitzin,et al.  A Primer on the Signature Method in Machine Learning , 2016, ArXiv.

[49]  Konstantinos Zagoris,et al.  ICDAR2017 Competition on Document Image Binarization (DIBCO 2017) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[50]  Vincent Christlein Handwriting Analysis with Focus on Writer Identification and Writer Retrieval , 2019 .

[51]  Nicole Vincent,et al.  Text independent writer recognition using redundant writing patterns with contour-based orientation and curvature features , 2010, Pattern Recognit..

[52]  Lambert Schomaker,et al.  Text-Independent Writer Identification and Verification Using Textural and Allographic Features , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Lianwen Jin,et al.  Chinese character-level writer identification using path signature feature, DropStroke and deep CNN , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[54]  Yassine Ruichek,et al.  Cross multi-scale locally encoded gradient patterns for off-line text-independent writer identification , 2020, Eng. Appl. Artif. Intell..

[55]  Lianwen Jin,et al.  Toward high-performance online HCCR: A CNN approach with DropDistortion, path signature and spatial stochastic max-pooling , 2017, Pattern Recognit. Lett..

[56]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[57]  Lambert Schomaker,et al.  Image-based historical manuscript dating using contour and stroke fragments , 2016, Pattern Recognit..

[58]  Lambert Schomaker,et al.  Automatic writer identification using fragmented connected-component contours , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[59]  Kazuhiko Yamamoto,et al.  Offline Text-Independent Writer Identification Based on Writer-Independent Model using Conditional AutoEncoder , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[60]  A. Papandreou,et al.  ICDAR 2013 Competition on Writer Identification , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[61]  Lambert Schomaker,et al.  Delta-n Hinge: Rotation-Invariant Features for Writer Identification , 2014, 2014 22nd International Conference on Pattern Recognition.

[62]  Lewis D. Griffin,et al.  Writer identification using oriented Basic Image Features and the Delta encoding , 2014, Pattern Recognit..

[63]  Reza Safabakhsh,et al.  Offline text-independent writer identification using codebook and efficient code extraction methods , 2013, Image Vis. Comput..

[64]  Volker Märgner,et al.  Normalised Local Naïve Bayes Nearest-Neighbour Classifier for Offline Writer Identification , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[65]  Andreas K. Maier,et al.  Encoding CNN Activations for Writer Recognition , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[66]  Lambert Schomaker,et al.  Writer identification using directional ink-trace width measurements , 2012, Pattern Recognit..

[67]  Marcus Liwicki,et al.  Sparse radial sampling LBP for writer identification , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[68]  Patrick Pérez,et al.  Revisiting the VLAD image representation , 2013, ACM Multimedia.

[69]  Robert Sablatnig,et al.  Writer Identification and Writer Retrieval Using the Fisher Vector on Visual Vocabularies , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[70]  Youbao Tang,et al.  Offline Text-Independent Writer Identification Based on Scale Invariant Feature Transform , 2014, IEEE Transactions on Information Forensics and Security.