Separator and content based approach for table extraction in handwritten chemistry documents

In this paper we present a separator line and content analysis based approach for table structure extraction in handwritten chemistry documents. A first module based on Hough Transform technique is used to detect all graphic lines in a document. The resulting grid is analyzed in order to find the cell boundaries. In case of absence of these lines, a second module uses content information to define boundaries between cells. The digits, representing the dominant components in the handled tables, are identified using a multistage classification system. Then, the digit cartography is analyzed based on syntactical rules in order to find cell boundaries. The proposed method has been tested on a set of handwritten chemistry documents and experimental results indicate satisfactory performance.

[1]  Michael Perrone,et al.  Confidence-scoring post-processing for off-line handwritten-character recognition verification , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[2]  Yonina C. Eldar,et al.  A probabilistic Hough transform , 1991, Pattern Recognit..

[3]  Abdel Belaïd,et al.  Table Detection in Handwritten Chemistry Documents Using Conditional Random Fields , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[4]  W. Bruce Croft,et al.  Table extraction using conditional random fields , 2003, DG.O.

[5]  Daniel P. Lopresti,et al.  Model-Based Tabular Structure Detection and Recognition in Noisy Handwritten Documents , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[6]  Giorgio Orsi,et al.  A methodology for evaluating algorithms for table understanding in PDF documents , 2012, DocEng '12.

[7]  Daniel P. Lopresti,et al.  Ruling-based table analysis for noisy handwritten documents , 2013, MOCR '13.

[8]  Daniel P. Lopresti,et al.  Table Detection in Noisy Off-line Handwritten Documents , 2011, 2011 International Conference on Document Analysis and Recognition.

[9]  J. Cordy,et al.  A Survey of Table Recognition : Models , Observations , Transformations , and Inferences , 2003 .

[10]  Ioannis Pratikakis,et al.  Automatic Table Detection in Document Images , 2005, ICAPR.

[11]  Thomas G Kieninger,et al.  Table structure recognition based on robust block segmentation , 1998, Electronic Imaging.