论文信息 - Information extraction from scanned documents by stochastic page layout analysis

Information extraction from scanned documents by stochastic page layout analysis

We propose a stochastic context-free grammar for extracting information from scanned document images. The grammar is designed to disambiguate layout analysis and utilize both layout and text features. We applied this grammar to the problem of extracting bibliographic information from scanned academic papers and found that it can accurately extract information.

Atsuhiro Takasu | Kenro Aihara

[1] George Nagy,et al. HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[2] Yalin Wang,et al. Table structure understanding and its performance evaluation , 2004, Pattern Recognit..

[3] Atsuhiro Takasu,et al. Mining knowledge from text using information extraction , 2005, SKDD.