Recognition And Data Extraction Of Form Documents Based On Three Types Of Line Segments

Abstract Almost all form documents contain line segments. In this paper, we propose an efficient method to recognize the form document that contains at least one line segment. Our method is based on an efficient representation model of the form. The representation model uses three types of line segments to represent a form. All line segments are normalized and sorted after they were extracted. The normalization and sorting not only solve the form scaling problem but also provide an unified and efficient way of matching between forms. To make the recognition method more robust, a fuzzy matching is used. Using the representation model, when recognizing a skew form, only the line segments and the data fields instead of the whole form image need to be rotated. Experimental results show the effectiveness and the efficiency of the method.