Ontology-Based Information Extraction of Crop Diseases on Chinese Web Pages

This paper proposes a method for extracting information of crop diseases on Chinese web pages. First, we define some special labels of the DOM tree[1] to partition the web page into some content blocks. Then the noise content in the web pages is eliminated according to the location and the word number of a content block. We employ an ontology-based way to implement information extraction from the content blocks. A top-down method is adopted to construct the ontology of crop diseases. In the extraction process, the concepts, relations and instances of ontology is used to extract the entities. The event is extracted by an optimal classification of paragraph groups in a content block. Experiments demonstrate the performance of the proposed method is satisfactory.

[1]  Jeff Z. Pan A Flexible Ontology Reasoning Architecture for the Semantic Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  An Zeng The research on vision-based Web page information extraction algorithm , 2010 .

[3]  Diana Maynard,et al.  Automatic Creation and Monitoring of Semantic Metadata in a Dynamic Knowledge Portal , 2004, AIMSA.

[4]  Suk I. Yoo,et al.  DOM tree browsing of a very large XML document: Design and implementation , 2009, J. Syst. Softw..

[5]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[6]  Zhang Dong-mei Survey of Web information extraction technologies , 2010 .

[7]  Samaneh Shokravi,et al.  An ontology approach to support FMEA studies , 2009, 2009 Annual Reliability and Maintainability Symposium.

[8]  Magdi N. Kamel,et al.  A Methodology for Developing Ontologies Using the Ontology Web Language (OWL) , 2007, ICEIS.

[9]  Bo Jin,et al.  Product design reuse with parts libraries and an engineering semantic web for small- and medium-sized manufacturing enterprises , 2008 .

[10]  Yu Kun,et al.  Resume Information Extraction Based on Cascaded Double-layer Classification , 2006 .

[11]  Omar Chiotti,et al.  Towards ontological engineering: a process for building a domain ontology from scratch in public administration , 2008, Expert Syst. J. Knowl. Eng..

[12]  Lin Chang-song Applying Model and Technological Realization about Heterogeneous Data Integrating System , 2006 .

[13]  Y. Biletskiy,et al.  Semantic annotation of semi-structured documents , 2008, 2008 Canadian Conference on Electrical and Computer Engineering.

[14]  He Feng,et al.  Research of Chinese Ontology Learning Based on HowNet , 2011 .