Web Information Extraction for the Creation of Metadata in Semantic Web

In this paper, we develop an automatic metadata creation system using the information extraction technology for the Semantic Web. The information extraction system consists of preparation part that takes written text as the input and produces the POS tags for the words in the sentences. Then we employ finite state machine technology to extract the units from the tagged sequences, including complex words, basic phrases and domain events. We use the components of an NLP software architecture, GATE, as the processing engine and support all required language resources for the engine. We have carried out an experiment on Chinese financial news. It shows promising precision rate while it need further investigation on the recall part. We describe the implementation of storing the extracted result in RDF to an RDF server and show the service interface for accessing the content.