Study on Web data extraction

A Web data extraction system is provided, which adopting Web page comparison and analysis within a website. On the basis of treeing and blocking Web pages, the data block of Web page is retrieved after compared and analyzed, and then the data is extracted via the comparison and judgement of more than one page of the same structure and format so as to actualize in-depth mining of technical information. The system's architecture and composition, and the process of the system tested on the physical property databases of chemistry are elaborated.