Conversion of PDF documents into HTML: a case study of document image analysis
暂无分享,去创建一个
Portable document format (PDF) has become the de facto standard in many fields because of its independence of local formatting restrictions and its accurate reproducibility. On the other hand, HTML documents are becoming an integral form of our lives by being the dominant form for information exchange within the World Wide Web environment. This paper discusses how image-processing techniques can be used to perform document layout analysis of complex multiple-column PDF documents. This analysis allows the conversion of these documents into the HTML format keeping the logical and physical layout intact.