Web information extraction based on news domain ontology theory

For the current web information extraction can't adapt to the various page structures, this paper proposes a Web Information Extraction Method based on News Domain Ontology. The areas are accurately found out and the interested information was extracted exactly based on information extraction rules which is generated by news domain ontology. Using the technology of page processing, page conversion, XPath etc, the information extraction system based on news domain ontology is implemented. Testing from news site shows that the approach proposed doesn 't rely on the page structure and it can increase the recall and precision of information extraction.