Research on LDA Model Algorithm of News-oriented Web Crawler

With fast development of big data, the data quantities and information types on the webpage are increasing tremendously. Consequently, it is becoming more difficult for users to obtain the valuable and interesting data and information from the webpage. The paper designs and implements a topic-focused crawler which utilizes Nodejs lightweight directional crawler to capture the data with great improvement on the efficiency of page retrievals.Firstly, the design idea and flow of the web crawler project is introduced. Furthermore, on the basis of crawled data jieba package of python is used to achieve text participle. Finally, LDA model algorithm is used to classify keyword texts to reach the purpose of classifying different types of news.