论文信息 - A rule-based system for document image segmentation

A rule-based system for document image segmentation

A rule-based system for automatically segmenting a document image into regions of text and nontext is presented. The initial stages of the system perform image enhancement functions such as adaptive thresholding, morphological processing, and skew detection and correction. The image segmentation process consists of smearing the original image via the run length smoothing algorithm, calculating the connected components locations and statistics, and filtering (segmenting) the image based on these statistics. The text regions can be converted (via an optical character reader) to a computer-searchable form, and the nontext regions can be extracted and preserved. The rule-based structure allows easy fine tuning of the algorithmic steps to produce robust rules, to incorporate additional tools (as they become available), and to handle special segmentation needs.<<ETX>>

[1] Christian Ronse,et al. Book-Review - Connected Components in Binary Images - the Detection Problem , 1984 .

[2] Sargur N. Srihari,et al. Classification of newspaper image blocks using texture analysis , 1989, Comput. Vis. Graph. Image Process..

[3] S.C. Hinds,et al. A document skew detection method using run-length encoding and the Hough transform , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[4] Friedrich M. Wahl,et al. Document Analysis System , 1982, IBM J. Res. Dev..

[5] Rangachar Kasturi,et al. A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[6] Sargur N. Srihari,et al. Recognizing Address Blocks on Mail Pieces: Specialized Tools and Problem-Solving Architecture , 1987, AI Mag..

[7] George Nagy,et al. DOCUMENT ANALYSIS WITH AN EXPERT SYSTEM , 1986 .

[8] Sargur N. Srihari. Document Image Understanding , 1986, FJCC.

[9] Edward R. Dougherty,et al. Morphological methods in image and signal processing , 1988 .