Data Harvesting 2.0: from the Visible to the Invisible Web

Personal data are fuelling a fast emerging industry which transform them into added value. Harvesting these data is therefore of the outermost importance for the economy. In this paper, we study the flows of personal data at a global level, and distinguish countries based on their capacity to harvest data. We establish a cartography of international data channels on the visible and invisible Web. The visible Web is composed of the sites that are available to the general public and are typically indexed by search engines. The invisible Web refers to tags, Web bugs, pixels and beacons that appear on Websites to track and profile users. It is well known that the US dominate the visible Web with more than 70% of the top 100 sites in the world. We show that this domination is even stronger on the invisible Web.The largest proportion of trackers in most countries are indeed from the US. Apart from the US, two countries exhibit an original strategy. China, which dominates its visible Web with a majority of local sites, but surprisingly these sites still contain a majority of US trackers. Russia, which also dominates its visible Web, and is the only country with more local trackers than US ones.

[1]  Adam Barth,et al.  HTTP State Management Mechanism , 2011, RFC.

[2]  Keir Giles,et al.  Russia's public stance on cyberspace issues , 2012, 2012 4th International Conference on Cyber Conflict (CYCON 2012).

[3]  F R A N Z I S K A R O E S N ShareMeNot Balancing Privacy and Functionality of Third-Party Social Widgets , .

[4]  Hovav Shacham,et al.  Pixel Perfect : Fingerprinting Canvas in HTML 5 , 2012 .

[5]  J. Reades,et al.  The Global Information Technology Report 2012 , 2012 .

[6]  C. Causer The Art of War , 2011, IEEE Potentials.

[7]  Jonathan Reades,et al.  The Global Information Technology Report 2012 , 2012 .

[8]  Claude Castelluccia,et al.  On the uniqueness of Web browsing history patterns , 2014, Ann. des Télécommunications.

[9]  Krishna P. Gummadi,et al.  Geographic Dissection of the Twitter Network , 2012, ICWSM.

[10]  S. Dutta,et al.  The global information technology report 2010-2011 , 2011 .

[11]  Martín Abadi,et al.  Host Fingerprinting and Tracking on the Web: Privacy and Security Implications , 2012, NDSS.

[12]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.1 , 1997, RFC.

[13]  Peter Eckersley,et al.  How Unique Is Your Web Browser? , 2010, Privacy Enhancing Technologies.

[14]  Balachander Krishnamurthy,et al.  WWW 2009 MADRID! Track: Security and Privacy / Session: Web Privacy Privacy Diffusion on the Web: A Longitudinal Perspective , 2022 .

[15]  Stéphane Grumbach The stakes of Big Data in the IT industry: China as the next global challenger? , 2013 .