Cross Domain Assessment of Document to HTML Conversion Tools to Quantify Text and Structural Loss during Document Analysis
暂无分享,去创建一个
[1] Yanhui Feng,et al. Using HTML Tags to Improve Parallel Resources Extraction , 2011, 2011 International Conference on Asian Language Processing.
[2] Ge Yu,et al. A Study on Information Extraction from PDF Files , 2005, ICMLC.
[3] Jie Zou,et al. Combining DOM tree and geometric layout analysis for online medical journal article segmentation , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).
[4] Sarang Pitale,et al. Information Extraction Tools for Portable Document Format , 2011 .
[5] F. Rahman,et al. Conversion of PDF documents into HTML: a case study of document image analysis , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.
[6] Chengjie Sun,et al. A Block Segmentation Based Approach for Web Information Extraction , 2010, 2010 International Conference on Asian Language Processing.
[7] Jer Lang Hong,et al. ViWER- data extraction for search engine results pages using visual cue and DOM Tree , 2010, 2010 International Conference on Information Retrieval & Knowledge Management (CAMP).
[8] Erik G. Learned-Miller,et al. Learning on the Fly: Font-Free Approaches to Difficult OCR Problems , 2009, 2009 10th International Conference on Document Analysis and Recognition.