Contextual Advertising (CA) refers to the placement of ads that are contextually related to the web page content. The science of CA deals with the task of finding advertising keywords from web pages. We present a different candidate selection method to extract advertising keywords from a web page. This method makes use of Part-of-Speech (POS) patterns that restrict the number of potential candidates a classifier has to handle. It fetches words/phrases that belong to the selected set of POS patterns. We design four systems based on chunking method and the features they use. These systems are trained on a naive Bayes classifier with a set of web pages annotated with 'advertising' keywords. The systems can then find advertising keywords from previously unseen web pages. Empirical evaluation shows that systems using the proposed chunking method perform better than the systems using N-Gram based chunking. All improvements in the systems are found statistically significant at a 99% confidence interval.
[1]
Yi-fang Brook Wu,et al.
Domain-specific keyphrase extraction
,
2005,
CIKM '05.
[2]
Daphne Koller,et al.
Toward Optimal Feature Selection
,
1996,
ICML.
[3]
Mark Sanderson,et al.
Information retrieval system evaluation: effort, sensitivity, and reliability
,
2005,
SIGIR '05.
[4]
Peter D. Turney.
Learning Algorithms for Keyphrase Extraction
,
2000,
Information Retrieval.
[5]
Joshua Goodman,et al.
Finding advertising keywords on web pages
,
2006,
WWW '06.
[6]
Anette Hulth,et al.
Improved Automatic Keyword Extraction Given More Linguistic Knowledge
,
2003,
EMNLP.
[7]
Xiaoyuan Wu,et al.
Keyword extraction for contextual advertisement
,
2008,
WWW.
[8]
Ellen M. Voorhees,et al.
Evaluating evaluation measure stability
,
2000,
SIGIR '00.