Document analysis by crosscount approach

In this paper a new feature called crosscount for document analysis is introduced. The feature crosscount is a function of white line segment with its start on the edge of document images. It reflects not only the contour of image, but also the periodicity of white lines (background) and text lines in the document images. In complex printed-page layouts, there are different blocks such as textual, graphical, tabular, and so on. Of these blocks, textual ones have the most obvious periodicity with their homogenous white lines arranged regularly. The important property of textual blocks can be extracted by crosscount functions. Here the document layouts are classified into three classes on the basis of their physical structures. Then the definition and properties of the crosscount function are described. According to the classification of document layouts, the application of this new feature to different types of document images’ analysis and understanding is discussed.