Research on Extracting Relevant Snippet from Blog Pages Using Title and Issue Tags
暂无分享,去创建一个
Search engines provide URLs together with some sentences which are extracted from the content at URLs in order to abstract the content. Those are called snippets. For generating fine quality snippets, the previous studies evaluate the similarity between query and content by counting query terms in documents, assigning more score to title words or considering snippet location in documents. But these methods have considered only the inner page features. We propose a new method that uses both inner page features and outer page features which reflect page characteristic. Usually, blog pages are more sensitive to issues then regular web pages because blog pages are much more ‘media like’. For this reason, we choose ‘social issues’ as an outer page feature. Since titles abstract the content and queries represent user intension, titles and queries are also used in our method as inner page features. We compare methods which use only inner page features with the method considering both inner page features and outer features.