KWB: An Automated Quick News System for Chinese Readers

We present an automated quick news system called KWB. KWB crawls and collects around the clock news items from over 120 news websites in mainland China, eliminates duplicates, and retrieves a summary of up to 600 characters for each news article using a proprietary summary engine. It then uses a Labeled-LDA classifier to classify the remaining news items into 19 categories, computes popularity ranks called PopuRank of the newly collected news items in each category, and displays the summaries of news items in each category sorted according to PopuRank together with a picture, if there is any, on http://www.kuaiwenbao.com and mobile apps. We will describe in this paper the system architecture of KWB, the data crawler structure, the functionalities of the central database, and the definition of PopuRank. We will show, through experiments, the running time of obtaining PopuRank. We will also demonstrate the use of KWB.

[1]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[2]  Anirban Dasgupta,et al.  Summarization Through Submodularity and Dispersion , 2013, ACL.

[3]  Li Peng,et al.  A focused web crawler face stock information of financial field , 2010, 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[4]  Zhao Tie-jun,et al.  A New Algorithm of Topical Crawler , 2009, 2009 Second International Workshop on Computer Science and Engineering.

[5]  Deren Chen,et al.  URL Rule Based Focused Crawler , 2008, 2008 IEEE International Conference on e-Business Engineering.

[6]  Mohsen Kahani,et al.  A focused linked data crawler based on HTML link analysis , 2014, 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE).

[7]  Luo Jia,et al.  A New Algorithm of Blog-Oriented Crawler , 2009, 2009 International Forum on Computer Science-Technology and Applications.

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Xueming Li,et al.  A Comprehensive Prediction Method of Visit Priority for Focused Crawler , 2011, 2011 2nd International Symposium on Intelligence Information Processing and Trusted Computing.