A Web Information Extraction Method Based on Ontology

Web information extraction is a very important and difficulty research subject which involves lots of fields, such as artificial intelligence, machinery learning, etc. As a modeling tool in describing the concept model of information systems at the semantic and knowledge level, ontology is widely used in many areas of computer science in recent years. A new method using ontology to extract valuable information from web documents was proposed in this paper. Firstly, according to the characteristics of the websites and web pages, the text content of web pages was extracted by locating the pages’ regional position. Secondly, on the basis of the traditional vector space model as well as the domain ontology, the concept vectors were generated according to the weightings of the concept vectors combining with the level structure feature of the ontology. Thus, the instances of ontology knowledge base were created semi-automatically, and the text of non-structured web page was turned into semantic structured information which can be understood by the machine.