Classification of web pages with geographic scope and level of details for mobile cache management

Although there is various useful information on the web,it is very hard to identify proper web pages, due to theproblem of heterogonous and large amount of volume. Incase of caching web pages on mobile devices, this is a crucialproblem. Since we have to select web pages satisfy-ingthe requirements in order to prefetch pages which havehigh possibility of usage. This paper focuses on the geographiccharacteristics and description types of web resources.Keyword based search does not take account ofthe positional information of geographic names and thuscollecting web resources related a specific region is verydifficult. Besides, it does not consider the differences betweena web page with detailed information and summaryinformation so that it cannot deal with the users requirementsof detailed information. In this paper, a method todetermine Geographic Scope and Level of Details of webpages is developed. Geographic Scope identifies the regiona web page mentions with the positional information of geographicnames. Level of Details classifies web pages intothree types, "Table-of-Contents type", "Summary type",and "Detailed-Description type", with considering HTMLtags, the number of kinds of geographic names and statisticvalues of parts of speech such as verbs and nouns. Experimentalresults show these two measures classify the webpages with relatively high precision. Finally, we present aMobile Cache algorithm using these two measures. It is definedbased on users' interest in a specific location.