Mining Websites Preferences on Web Events in Big Data Environment

On the web, there are numerous websites publishing web pages to cover the events occurring in society. The web events data satisfies the well-accepted attributes of big data: Volume, Velocity, Variety and Value. As a great value of web events data, website preferences can help the followers of web events, e.g. peoples or organizations, to select the proper websites to follow their interested aspects of web events. However, the big volume, fast evolution speed, multisource and unstructured data all together make the value of website preferences mining very challenging. In this paper, website preference is formally defined at first. Then, according to the hierarchical attribute of web events data, we propose a hierarchical network model to organize big data of a web event from different organizations, different areas and different nations at a given time stamp. With this hierarchical network structure in hand, two strategies are proposed to mine the value of websites preferences from web events data. The first straightforward strategy utilizes the communities of keyword level network and the mapping relations between websites and keywords to unveil the Value in them. By taking the whole hierarchical network structure into consideration, an iterative algorithm is proposed in second strategy to refine the keyword communities like the first strategy. At last, an evaluation criteria of website preferences is designed to compare the performances of two proposed strategies. Experimental results show the proper combination of horizontal relations (each level network) with vertical relations (mapping relations between three level networks) can extract more value from web events data and then improve the efficiency on website preferences mining.

[1]  Xue Chen,et al.  Building Association Link Network for Semantic Link on Web Resources , 2011, IEEE Transactions on Automation Science and Engineering.

[2]  Zhiyong Wang,et al.  STRank: A SiteRank algorithm using semantic relevance and time frequency , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[3]  Jinjun Chen,et al.  A hybrid approach for scalable sub-tree anonymization over big data using MapReduce on cloud , 2014, J. Comput. Syst. Sci..

[4]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[5]  Rob Law,et al.  A New Framework on Website Evaluation , 2010, 2010 International Conference on E-Business and E-Government.

[6]  Jinjun Chen,et al.  Authorized Public Auditing of Dynamic Big Data Storage on Cloud with Efficient Verifiable Fine-Grained Updates , 2014, IEEE Transactions on Parallel and Distributed Systems.

[7]  Michelle X. Zhou,et al.  Who is Doing What and When: Social Map-Based Recommendation for Content-Centric Social Web Sites , 2011, TIST.

[8]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Anthony K. H. Tung,et al.  K-Anonymity for Crowdsourcing Database , 2014, IEEE Transactions on Knowledge and Data Engineering.

[10]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.