N-layer Approach to Web Information Retrieval

In web information retrieval, the terms or keywords are used for indexing purpose of document. These terms or keywords appear in special location such as title, subtitle, header, hyperlinks and so on. Vector space model ignores the importance of these terms with respect to their position while calculating the weight of the indexing terms. The effectiveness of the vector space model crucially depends on the weights applied to the terms of the document vectors. These weights are found using a term weight evaluation scheme based on the frequency of the terms in the document and the collection. Terms that occur more often in a document are treated as more important whereas terms that occur less frequently throughout a collection are given a higher weight. In N-level Vector space approach, the importance of these terms with respect to their position is considered. The web document is logically divided in N-layer considering the structure of web document and weights are assigned to terms based on their presence in different layer within the document. Different weight evaluation schemes proposed for vector space models are applied to N-level vector space model and are compared. N-layer vector space model gives better result as compare to vector space model. Cosine similarity and all six weight evaluation methods that are formed using different local weights and global weights show that average precision and average recall in case of N-layer vector space model is always better than vector space model.