Advertising Keywords Extraction from Web Pages

A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and it has been become a rapidly growing business in recent years. We describe a system that learns how to extract keywords from web pages for advertisement targeting. Firstly a text network for a single webpage is build, then PageRank is applied in the network to decide on the importance of a word, finally top-ranked words are selected as keywords of the webpage. The algorithm is tested on the corpus of blog pages, and the experiment result proves practical and effective.

[1]  Ricard V. Solé,et al.  Language networks: Their structure, function, and evolution , 2010 .

[2]  Ricard V. Solé,et al.  Language networks: Their structure, function, and evolution , 2007, Complex..

[3]  Tom M. Mitchell,et al.  Machine Learning Meets Natural Language , 1997, Portuguese Conference on Artificial Intelligence.

[4]  Cong Wang,et al.  Keyword Extraction Based on PageRank , 2007, PAKDD.

[5]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[6]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[7]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[8]  M. Fabbri,et al.  Keyword extraction in open-domain multilingual textual resources , 2005, First International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution (AXMEDIS'05).

[9]  S N Dorogovtsev,et al.  Language as an evolving word web , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[10]  Wenfeng Yang Chinese keyword extraction based on max-duplicated strings of the documents , 2002, SIGIR '02.

[11]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[12]  Joshua Goodman,et al.  Finding advertising keywords on web pages , 2006, WWW '06.

[13]  Xin Jin,et al.  Sensitive webpage classification for content advertising , 2007, ADKDD '07.

[14]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[15]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.