A Document Image Analysis and Recognition System for Japanese Family Registration

A family registration data entry system with functions for automatic form layout analysis and character recognition was developed. The layout analysis module first detects characters and ruled lines by using information on the top and bottom boundaries of smeared black components. It then determines the layout and identifies each field in the layout by comparing predefined models with detected lines. Character strings in the fields are recognized and matched with a dictionary to check whether a sequence is plausible as a Japanese word or not. The text data are registered in a database after they have been examined by an operator and keywords have been extracted. This system was actually used for the initial entry of typed family registration forms in Tokyo's Toshima Ward, which contributed to establish the first computerized family registration system in Japan.