Personal News RSS Feeds Generation Using Existing News Feeds

Nowadays more and more news sites publish news stories using news RSS feeds for easier access and subscription on the Web. Generally, news stories are grouped by several categories and each category corresponds to one news RSS feed. However there are no uniform standards for categorization. Each news site has its own way of categorization for grouping news stories. These dissimilar categorization can not always satisfy every individual user, and generally the provided categories are not detailed enough for personal using. In this paper, we proposed a method for users to create customizable personal news RSS feeds using existing ones. We implemented a news directory system(NDS) which can retrieve news stories by RSS feeds and classify them. Using this system, we can recategorize news stories from original RSS feeds, or subdivide one RSS feed to a more detailed level. With the classification information for each news article, we offer customizable personal news RSS feeds to subscribers.

[1]  Paul Procter,et al.  Longman Dictionary of Contemporary English , 1978 .

[2]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[3]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[4]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[5]  Ian Witten,et al.  Data Mining , 2000 .

[6]  Bin Liu,et al.  Towards Automatic Construction of News Directory Systems , 2007, EJC.

[7]  Debbie Zhang,et al.  Informing the Curious Negotiator: Automatic News Extraction from the Internet , 2006, Selected Papers from AusDM.

[8]  Martin Halvey,et al.  WWW '07: Proceedings of the 16th international conference on World Wide Web , 2007, WWW 2007.

[9]  Christopher Meek,et al.  Challenges of the Email Domain for Text Classification , 2000, ICML.

[10]  Hasan Davulcu,et al.  Automated metadata and instance extraction from news Web sites , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[11]  Gary Boone,et al.  Concept features in Re:Agent, an intelligent Email agent , 1998, AGENTS '98.

[12]  刘江雪,et al.  LIN volume 11 issue 2 Cover and Back matter , 1975, Journal of Linguistics.

[13]  Patrick Pantel,et al.  SpamCop: A Spam Classification & Organisation Program , 1998, AAAI 1998.

[14]  Hector Garcia-Molina,et al.  Extracting structured data from Web pages , 2003, SIGMOD '03.

[15]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[16]  Yang Song,et al.  IKNN: Informative K-Nearest Neighbor Pattern Classification , 2007, PKDD.

[17]  Valter Crescenzi,et al.  Automatic information extraction from large websites , 2004, JACM.

[18]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[19]  Alberto H. F. Laender,et al.  Automatic web news extraction using tree edit distance , 2004, WWW '04.

[20]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.