Study of Subject Web Information Extractor Based on XML

Focused on the Web data extraction problem in web mining, a method of web data extraction based on XML is designed. Because the supreme characteristic of Web data is half-structured, Using XML, a kind of half-structured data model, to solve the hard problem of saving web data in traditional relation database, corresponds the document descriptions of XML with fields of database and realizes the query accurately and model extracting. Because most information of Web data is independent of extraction, using XSL to filter irrespective data and extract in realtime. At last, the uniting extraction data is saved in XML document. The test indicates that the method can solve the extraction and storage of web data elegantly.