Design and Research of Composite Web Page Classification Network Based on Deep Learning

This paper proposes a network model that combines long and short feature extractors to solve the problem of automatic classification of web pages. By analyzing the current major portal websites, the main categories of original corpus are formulated. By analyzing the composition of webpage content, the composite extraction features of long and short feature extractors are designed. The attention mechanism is introduced in the short feature extraction network to enhance the ability of short text information extraction. For the longer text, the long feature extraction network combines the attention mechanism of the word and segment to capture information. In the last layer of the classification, the correction mechanism is used for model fusion, which further improves the accuracy of classification. The experimental results show that the proposed method has higher classification accuracy. The classification indicators under the first-level label all reached 0.94 or higher, and 0.90 under the secondary label. The composite feature extraction network designed in this paper has better anti-noise ability and classification efficiency, and can achieve higher classification accuracy.