Identifying website communities in mobile internet based on affinity measurement

With the rapid development of mobile devices and wireless technologies, mobile internet websites play an essential role for delivering networked services in our daily life. Thus, identifying website communities in mobile internet is of theoretical and practical significance in optimizing network resource and improving user experience. Existing solutions are, however, limited to retrieve website communities based on hyperlink structure and content similarities. The relationships between user behaviors and community structures are far from being understood. In this paper, we develop a three-step algorithm to extract communities by affinity measurement derived from user accessing information. Through experimental evaluation with massive detailed HTTP traffic records captured from a cellular core network by high performance monitoring devices, we show that our affinity measurement based method is effective in identifying hidden website communities in mobile internet, which have evaded previous link-based and content-based approaches.

[1]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[2]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[3]  Sergey N. Dorogovtsev,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW (Physics) , 2003 .

[4]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[5]  Nei Kato,et al.  Reliable Application Layer Multicast Over Combined Wired and Wireless Networks , 2009, IEEE Transactions on Multimedia.

[6]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[7]  J. A. Tenreiro Machado,et al.  A review of power laws in real life phenomena , 2012 .

[8]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[9]  Chaomei Chen Structuring and visualising the WWW by generalised similarity analysis , 1997, HYPERTEXT '97.

[10]  Jussi Kangasharju,et al.  Object replication strategies in content distribution networks , 2002, Comput. Commun..

[11]  Sougata Mukherjea,et al.  Focus+context views of World-Wide Web nodes , 1997, HYPERTEXT '97.

[12]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[13]  Hans-Peter Kriegel,et al.  Classification of Websites as Sets of Feature Vectors , 2004, Databases and Applications.

[14]  Hans-Peter Kriegel,et al.  Web site mining: a new way to spot competitors, customers and suppliers in the world wide web , 2002, KDD.

[15]  Mingquan Wu,et al.  On Accelerating Content Delivery in Mobile Networks , 2013, IEEE Communications Surveys & Tutorials.

[16]  Esteban Meneses Vectors and Graphs: Two Representations to Cluster Web Sites Using Hyperstructure , 2006, 2006 Fourth Latin American Web Congress.

[17]  Mingquan Wu,et al.  On Wide Area Network Optimization , 2012, IEEE Communications Surveys & Tutorials.

[18]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[19]  Alberto Prieto,et al.  CLUSTERING WEB-BASED COMMUNITIES USING SELF-ORGANIZING MAPS , 2004 .

[20]  Paolo Tonella,et al.  Using keyword extraction for Web site clustering , 2003, Fifth IEEE International Workshop on Web Site Evolution, 2003. Theme: Architecture. Proceedings..

[21]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[22]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[23]  Symeon Papavassiliou,et al.  Adaptive QoS provisioning by pricing incentive QoS routing for next generation networks , 2008, Comput. Commun..

[24]  Filippo Menczer Links tell us about lexical and semantic Web content , 2001, ArXiv.

[25]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[26]  Paolo Tonella,et al.  An empirical study on keyword-based Web site clustering , 2004, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004..

[27]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[28]  Nirwan Ansari,et al.  On assuring end-to-end QoE in next generation networks: challenges and a possible solution , 2011, IEEE Communications Magazine.

[29]  An Introduction to Cluster Analysis for Data Mining , 2000 .

[30]  Ellen Spertus,et al.  ParaSite: Mining Structural Information on the Web , 1997, Comput. Networks.

[31]  Masaru Kitsuregawa,et al.  On Combining Link and Contents Information for Web Page Clustering , 2002, DEXA.