Automated borders detection and adaptive segmentation for binary document images

This paper describes two new and effective algorithms: one for detecting the page borders for documents available as binary images, and the other an adaptive segmentation algorithm using a bottom-up approach for segmenting binary images into blocks. The borders detection algorithm relies upon the classification of blank/textual/non-textual rows and columns, objects segmentation, and an analysis of projection profiles and crossing counts. Segmentation, done by an adaptive smearing technique, is different from all previous bottom-up approaches because any decisions on merging and/or separating are based on the estimated font information in binary document images.