A Method Combining Syntax Analysis and Correction Rules to Re-construct the Re-flowable Document

To improve the shortcomings of fault-tolerance ability in the re-construction method for re-flowable document structure, a new method combining left-corner method and correction rules is proposed, where the xml schema is applied to construct a syntax tree of typesetting rules of document components, and left-corner method is applied to analyze the logical components of the document supervised by the syntax tree. In the analysis process, the correction rules are used to correct the possible errors existed in document component and eventually get the most likely document structure. The results show that the algorithm can effectively improve the fault tolerance in the document structure reconstruction and the accuracy of document structure recognition, which forms the foundation for document understanding and format checking.

[1]  Claire David,et al.  XML Schema Mappings , 2014, J. ACM.

[2]  Zhaohui Wu,et al.  Table of Contents Recognition and Extraction for Heterogeneous Book Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Shi Yun-mei The function of format information in document understanding , 2012 .

[4]  Hervé Panetto,et al.  Semantic annotation for knowledge explicitation in a product lifecycle management context: A survey , 2015, Comput. Ind..

[5]  Chengqing Zong,et al.  A Minimum Error Weighting Combination Strategy for Chinese Semantic Role Labeling , 2010, COLING.

[6]  C. Roussey,et al.  Feature vector construction combining structure and content for document classification , 2012, 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT).