论文信息 - Extraction method for characters in form document image

Extraction method for characters in form document image

The invention relates to the field of image processing and computer vision technologies, in particular to an extraction method for characters in a form document image. The extraction method includes a first step of extracting line segments in the image through edge detection and Hough transformation algorithm, a second step of estimating an inclined angle of the whole image according to direction distribution of the line segments and carrying out inclination correction on the image, a third step of connecting the line segments in the horizontal direction and the perpendicular direction, and locating table cells of a form, a fourth step of carrying out image binaryzation and segmenting a full line of characters in the table cells through a maximum between-cluster variance method, and extracting the characters in the table cells through a window sliding method, and a fifth step of carrying out restoration on deletion of strokes of the characters according to statistics features of frame lines of the table cells. The extraction method is good in flexibility and capable of effectively solving the problems of adhesion between the characters and overlap between the characters and form lines, and greatly reduces the influence of the adhesion and overlap on optical character recognition (OCR).

王俊峰 | 李虹 | 高琳 | 姬郁林