Numbered sequence detection in documents

We present in this work a method to detect numbered sequences in a document. The method relies on the following steps: first, all potential "numbered patterns" are automatically extracted from the document. Secondly, possible coherent sequences are built using pattern incrementality (called incremental relation). Finally possible wrong links between items are corrected using the notion of optimization context. An evaluation of the method is presented and weaknesses and possible improvements are discussed.

[1]  Jean-Luc Meunier,et al.  Versatile page numbering analysis , 2008, Electronic Imaging.

[2]  C. Lloyd Logistic Regression Models by J.M. Hilbe , 2012 .

[3]  Jean-Luc Meunier,et al.  Structuring documents according to their table of contents , 2005, DocEng '05.

[4]  Zhi Tang,et al.  Analysis of Book Documents' Table of Content Based on Clustering , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[5]  Seinosuke Narita,et al.  Logical structure analysis of book document images using contents information , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.