Analyzing Anchor-Links to Extract Semantic Inferences of a Web Page

Since an anchor is used in an HTML document to point to a related document/picture/media application, the existing approaches [3,4,5], to find out the information about an associated Web page, are based on the use of anchor-text contained in the anchor tag. The problem with this approach is that sometimes anchor-texts are either not present at all or a single word text / an image anchor is contained in the anchor tag. In this paper, a dataset of about hundred Web pages of different categories from open directory project (ODP) has been surveyed and analyzed. The result shows that cohesive text surrounding the anchor and non-cohesive text present elsewhere in the Web pages provides rich semantic cues about a target Web page.