Table recognition for automated document entry system
暂无分享,去创建一个
Most documents include various layout objects such as headlines text lines charts and tables. In particular tables are powerful tools that allow large quantities of data to be easily understood. An automated document entry system is needed that can recognize the document layout objects and extract the information from tables. In this paper an effective table recognition method is described. The proposed method is composed of three steps: (1) document layout structure recognition (2) table layout structure recognition (3) table content recognition. To develop the table layout structure recognition step we first examined the layout structure of tables in existing documents and classified several common structures. As a result of the examination we created ten rules and designed a ruled line and box extraction algorithm based on these rules. The effectiveness of the proposed method has been confirmed in experiments. Accordingly the proposed method will greatly contribute to the creation of an automated document entry system to allow faster document recognition and permit the data in tables to be extracted.
[1] Norihiro Hagita,et al. Automatic Reading System for Printed Documents , 1988, MVA.