Query expansion using information scent

Web has grown to a huge mass of information resource and is diverse in content. To search such rich source of information one has to be very precise in using keywords in queries to retrieve the relevant documents. Most of the queries issued to search engines are short and have ambiguous context. One way to produce effective queries is by automatic query expansion. Work has been done in this field to use the local and global techniques. The global techniques examine word occurrences and relationships in the corpus as a whole and use this information to expand a particular query. Local context analysis examines the concept occurrences and relationship in top ranked documents retrieved by the original input query to expand the same query. Query log of search engines is used by researchers to expand the input queries using the clicked documents related to any of the terms of input query in query session of query log. In this paper a new local analysis technique is proposed which make use of information need of query sessions modeled using Information Scent and content of clicked documents to select the clicked documents for query expansion. Information scent is the subjective sense of value and cost of accessing a page based on perceptual cues with respect to the information need of the user. The input query issued in a particular domain is used to select the set of documents associated with the information need of the query sessions in the same domain and used as local corpora to provide related set of terms to be added to the input query. The resulting expanded query is used to retrieve the relevant documents from the same retrieval system. This approach is unique as it is using those documents in local corpora which belong to the information need associated with the domain in which input query is issued using Information Scent and content of clicked pages in the query sessions and direct the search in a fruitful direction by expanding initial input query using set of related terms. Experimental study of the proposed approach is done on the data set extracted from Web history of ldquoGooglerdquo search engine and improvement in the information retrieval precision with low computation complexity during online processing of input queries confirms the effectiveness of the proposed approach.