Directional web data extraction method
暂无分享,去创建一个
The invention provides a directional web data extraction method. The method comprises the following steps: carrying out source code grammatical rule analysis on web files by virtue of data structural features of web data to be extracted presented in the web files, and then constructing a data matching model with the data structural features through a regular expression; and carrying out data matching on source codes of the web files, extracting the web data which needs to be extracted from a part of the matched source codes, so that the problem of directional web data extraction is sloved. In the method, the regular expression is taken as a matching tool, which has strong operability for the technical personnel in the field and is beneficial to popularization and application of the method; and aiming at certain web data with more complicated data structural features and higher extraction difficulty, the invention further provides a directional extraction proposal for extracting the web data to be extracted step by step in a multistage location manner, thus having stronger adaptability and wide application range.