An Algorithm of City-Level Landmark Mining Based on Internet Forum

Density and accuracy of network entity landmarks are an important foundation of IP geolocation. For the existing problems of the limited quantity and low reliability of landmarks mined by current landmark mining methods, an algorithm of city-level landmark mining based on Internet forum is proposed in this paper. Firstly, the basic principle of Web-Based landmark mining methods and their existing flaws are analyzed, and then according to existing a huge amount of individual IP addresses in the Internet forums, a technical framework of network entity landmark mining based on the Internet forum is given, Next, the Internet forum selection strategy, IP addresses extraction, IP addresses screening and other major processing steps are described respectively for two major parts of the framework, including landmark extraction algorithm and landmark evaluation algorithm. The classic GeoTrack, a network entity geolocation algorithm, is improved and used for evaluating the candidate landmarks. Finally, the feasibility of our framework and algorithm are studied from 2 aspects: forum selection strategy, and Forum-Based landmark mining algorithm. Experimental results based on 27 Internet forums of 3 types forums in 3 cities show that compared with the classic Web-Based landmark mining methods, the proposed algorithm can not only mine huge amounts of city-level landmarks, but also improve the city-level network entity geolocation accuracy obviously.

[1]  Serge Fdida,et al.  Constraint-Based Geolocation of Internet Hosts , 2004, IEEE/ACM Transactions on Networking.

[2]  Lei Shi,et al.  How does the recursive undns algorithm affect the accuracy of an IP geolocation system? , 2013, 2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[3]  David Wetherall,et al.  Towards IP geolocation using delay and topology measurements , 2006, IMC '06.

[4]  Lakshminarayanan Subramanian,et al.  An investigation of geographic mapping techniques for internet hosts , 2001, SIGCOMM 2001.

[5]  Helen J. Wang,et al.  Mining the Web and the Internet for Accurate IP Address Geolocations , 2009, IEEE INFOCOM 2009.

[6]  John S. Heidemann,et al.  Towards geolocation of millions of IP addresses , 2012, IMC '12.

[7]  Emin Gün Sirer,et al.  Octant: A Comprehensive Framework for the Geolocalization of Internet Hosts , 2007, NSDI.

[8]  Aleksandar Kuzmanovic,et al.  Towards Street-Level Client-Independent IP Geolocation , 2011, NSDI.

[9]  Xing Changyou,et al.  Research on the IP Geolocation Technology , 2014 .

[10]  Lakshminarayanan Subramanian,et al.  An investigation of geographic mapping techniques for internet hosts , 2001, SIGCOMM.

[11]  Kevin Curran,et al.  Bringing location to IP Addresses with IP Geolocation , 2012 .

[12]  Tinghuai Ma,et al.  Social Network and Tag Sources Based Augmenting Collaborative Recommender System , 2015, IEICE Trans. Inf. Syst..

[13]  Dan Li,et al.  IP-Geolocation Mapping for Moderately Connected Internet Regions , 2013, IEEE Transactions on Parallel and Distributed Systems.

[14]  A. Salomaa Regular expression , 2003 .

[15]  Paul C. van Oorschot,et al.  Internet geolocation: Evasion and counterevasion , 2009, CSUR.