Extraction and Evaluation of Candidate Named Entities in Search Engine Queries

Named Entity Recognition (NER) has recently been applied to search queries, in order to better understand their semantics. We present a novel method for detecting candidate named entities (NEs) using grammar annotation and query segmentation with the aid of top-n snippets from search engine results, and a web n-gram model to accurately identify NE boundaries. We then evaluate this method automatically using DBpedia as a rich data source of NEs, with the aid of a small representative random sample that is manually annotated. Finally, an analysis of the types of named entities that often occur in a query log is conducted, from which a search query driven named entity taxonomy is presented.