Stroke Extraction for Offline Handwritten Mathematical Expression Recognition

Offline handwritten mathematical expression recognition is often considered much harder than its online counterpart due to the absence of temporal information. In order to take advantage of the more mature methods for online recognition and save resources, an oversegmentation approach is proposed to recover strokes from textual bitmap images automatically. The proposed algorithm first breaks down the skeleton of a binarized image into junctions and segments, then segments are merged to form strokes, finally stroke order is normalized by using recursive projection and topological sort. Good offline accuracy was obtained in combination with ordinary online recognizers, which were not specially designed for extracted strokes. Given a ready-made state-of-the-art online handwritten mathematical expression recognizer, the proposed procedure correctly recognized 58.22%, 65.65%, and 65.22% of the offline formulas rendered from the datasets of the Competitions on Recognition of Online Handwritten Mathematical Expressions (CROHME) in 2014, 2016, and 2019 respectively. Furthermore, given a trainable online recognition system, retraining it with extracted strokes resulted in an offline recognizer with the same level of accuracy. On the other hand, the speed of the entire pipeline was fast enough to facilitate on-device recognition on mobile phones with limited resources. To conclude, stroke extraction provides an attractive way to build optical character recognition software.

[1]  WATCH , 2004 .

[2]  Stefan Jäger,et al.  Recovering writing traces in off-line handwriting recognition: using a global optimization technique , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[3]  Masakazu Suzuki,et al.  INFTY: an integrated OCR system for mathematical documents , 2003, DocEng '03.

[4]  Utpal Garain,et al.  OCR of Printed Mathematical Expressions , 2007 .

[5]  R. Yamamoto,et al.  On-Line Recognition of Handwritten Mathematical Expression Based on Stroke-Based Stochastic Context-Free Grammar , 2006 .

[6]  Hiroyuki Fujioka,et al.  Recovering Dynamic Stroke Information of Multi-stroke Handwritten Characters with Complex Patterns , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[7]  Angelo Chianese,et al.  Recovering dynamic information from static handwriting , 1993, Pattern Recognit..

[8]  Fei Yin,et al.  Image-to-Markup Generation via Paired Adversarial Learning , 2018, ECML/PKDD.

[9]  Robert H. Anderson Syntax-directed recognition of hand-printed two-dimensional mathematics , 1967, Symposium on Interactive Systems for Experimental Applied Mathematics.

[10]  Alexander M. Rush,et al.  Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.

[11]  Ching Y. Suen,et al.  A fast parallel algorithm for thinning digital patterns , 1984, CACM.

[12]  Joan-Andreu Sánchez,et al.  Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models , 2014, Pattern Recognit. Lett..

[13]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Christian Viard-Gaudin,et al.  From Off-line to On-line Handwriting Recognition , 2004 .

[15]  Azriel Rosenfeld,et al.  Recovery of temporal information from static images of handwriting , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Kuo-Chin Fan,et al.  A run-length-coding-based approach to stroke extraction of Chinese characters , 2000, Pattern Recognit..

[17]  Masaki Nakagawa,et al.  Modified X-Y Cut for Re-Ordering Strokes of Online Handwritten Mathematical Expressions , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[18]  Bipin Indurkhya,et al.  Stroke order normalization for improving recognition of online handwritten mathematical expressions , 2019, International Journal on Document Analysis and Recognition (IJDAR).

[19]  Jin Hyung Kim,et al.  Model-based stroke extraction and matching for handwritten Chinese character recognition , 2001, Pattern Recognit..

[20]  Hirobumi Nishida,et al.  An approach to integration of off-line and on-line recognition of handwriting , 1995, Pattern Recognit. Lett..

[21]  Shiliang Zhang,et al.  Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition , 2017, Pattern Recognit..

[22]  Sukhan Lee,et al.  Offline tracing and representation of signatures , 1992, IEEE Trans. Syst. Man Cybern..

[23]  Massimo Marchiori,et al.  The Mathematical Semantic Web , 2003, MKM.

[24]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[25]  Makoto Yasuhara,et al.  Recovery of Drawing Order from Single-Stroke Handwriting Images , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Dit-Yan Yeung,et al.  Mathematical expression recognition: a survey , 2000, International Journal on Document Analysis and Recognition.

[27]  Harold Mouchère,et al.  ICFHR 2014 Competition on Recognition of On-Line Handwritten Mathematical Expressions (CROHME 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[28]  Bipin Indurkhya,et al.  Pattern Generation Strategies for Improving Recognition of Handwritten Mathematical Expressions , 2019, Pattern Recognit. Lett..

[29]  Jun Du,et al.  Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition , 2019, IEEE Transactions on Multimedia.

[30]  Patrick Shen-Pei Wang,et al.  A Fast and Flexible Thinning Algorithm , 1989, IEEE Trans. Computers.

[31]  Xiao Zhao,et al.  The connected-component labeling problem: A review of state-of-the-art algorithms , 2017, Pattern Recognit..

[32]  Olga Radyvonenko,et al.  Acceleration of Online Recognition of 2D Sequences Using Deep Bidirectional LSTM and Dynamic Programming , 2019, IWANN.

[33]  Richard Zanibbi,et al.  Recognition and retrieval of mathematical expressions , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[34]  Harold Mouchère,et al.  ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[35]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Yuan Yan Tang,et al.  Stroke extraction and stroke sequence estimation on signatures , 2002, Object recognition supported by user interaction for service robots.

[37]  Jun Du,et al.  Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[38]  Masaki Nakagawa,et al.  A system for recognizing online handwritten mathematical expressions by using improved structural analysis , 2016, International Journal on Document Analysis and Recognition (IJDAR).

[39]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[40]  Fotini Simistira,et al.  Recognition of online handwritten mathematical formulas using probabilistic SVMs and stochastic context free grammars , 2015, Pattern Recognit. Lett..

[41]  Harold Mouchère,et al.  ICFHR2016 CROHME: Competition on Recognition of Online Handwritten Mathematical Expressions , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[42]  Harold Mouchère,et al.  Advancing the state of the art for handwritten math recognition: the CROHME competitions, 2011–2014 , 2016, International Journal on Document Analysis and Recognition (IJDAR).

[43]  Harold Mouchère,et al.  A global learning approach for an online handwritten mathematical expression recognition system , 2014, Pattern Recognit. Lett..