Automatic reading of the white pages in a telephone directory
暂无分享,去创建一个
An optical character recognition (OCR) system for the reading of the white pages of Sydney’s telephone directories is described. The system is used to process each scanned page automatically. First, column segmentation, special symbol segmentation, text line segmentation, and character segmentation are performed. Second, a new structural method is developed to recognize each segmented character based on its skeleton decomposition and coding. Third, a postprocessing module is used to verify the confusing letters based on the text layout information and the contextual information. Experiments with test pages show an average recognition rate of 99.5% with a reliability rate of 99.85%.