Reading and Learning

The problem of angled feeds happens mainly with fast production scanners. This is a mechanical problem of most feeders when feeding pages in high speed. To improve the image quality, it is necessary to have a high speed deskew algorithm that eliminates the skew in an electronic way. First the skew of the document must be found. After this, the skew can be corrected. This chapter shows how to find the skew, even if the document is damaged. It also shows several methods to deskew the document. 1 How to Find Out the Skew Angle Before a skew can be corrected, first the skew angle must be found out. There are two methods: Method 1: Searching the image contents for horizontal or vertical structures Advantages: No brightness difference is required between document and background. Also documents without any scanned background can be processed. Disadvantages: A very complicated algorithm is required in order to securely find horizontal or vertical structures on different originals; but eventually still wrong angles remain. The algorithm returns just an angle, but not the outer borders of the scanned document. Method 2: Searching for the borders of the scanned document. Advantages: Finding the borders of a document is far easier, faster and more reliable than finding structures in the document. In addition to the angle, also the outer borders of the scanned document are recognized. Disadvantages: The document must be scanned in "oversize". It makes sense to scan an additional rim of about 2cm. There must be a difference in brightness or colour between scanner background and document. When scanning with production scanners, the method to find the borders suggests itself as its advantages predominate. The alleged disadvantage of scanning in oversize is often seen as advantage. On one hand, a slightly larger image format must be captured if tilted documents must be expected and no information shall get lost at the rim. If for example A4 documents are scanned exactly with A4, their corners will be cut if the document had a skew. 2 D. Woitha and D. Janich On the other hand, a larger image format may be set, and all smaller image formats can be scanned without changing the parameters. Finding the borders and subsequent deskew will then deliver what in principle is wanted: "A straight document without any additional rim". The necessary difference between background and document can usually be realized without problems. A black background mostly gives enough contrast to the original. If search for borders is started from a bitonal document, problems may arise if dark elements exist at the rim of the image. In this case, precautions have to be taken when searching for borders. 2 The Practice of Border Finding The sample image below serves for illustrating the border finding; it shows the detected border pixels and the rectangle resulting from it. Fig. 1. Skewed Document Below follows a closer description of the realized algorithm for border finding. This algorithm bases on a light document on a black background as this is the case most often occurring in practice. The document has some typical problem areas; their processing will be detailed later: Non-clean background (dust, for example) Black beam at the document border Torn corner Non-interrupted white line (dust or CCD error, for example) Dust Black Beam White Line