Website removal from search engines due to copyright violation

Purpose The purpose of this paper is to clarify how many removal requests are made, how often, and who makes these requests, as well as which websites are reported to search engines so they can be removed from the search results. Design/methodology/approach Undertakes a deep analysis of more than 3.2bn removed pages from Google’s search results requested by reporting organizations from 2011 to 2018 and over 460m removed pages from Bing’s search results requested by reporting organizations from 2015 to 2017. The paper focuses on pages that belong to the .pl country coded top-level domain (ccTLD). Findings Although the number of requests to remove data from search results has been growing year on year, fewer URLs have been reported in recent years. Some of the requests are, however, unjustified and are rejected by teams representing the search engines. In terms of reporting copyright violations, one company in particular stands out (AudioLock.Net), accounting for 28.1 percent of all reports sent to Google (the top ten companies combined were responsible for 61.3 percent of the total number of reports). Research limitations/implications As not every request can be published, the study is based only what is publicly available. Also, the data assigned to Poland is only based on the ccTLD domain name (.pl); other domain extensions for Polish internet users were not considered. Originality/value This is first global analysis of data from transparency reports published by search engine companies as prior research has been based on specific notices.

[1]  Matthew Sag Internet Safe Harbors and the Transformation of Copyright Law , 2017 .

[2]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[3]  Cecilia Andersson,et al.  "Google is not fun": an investigation of how Swedish teenagers frame online searching , 2017, J. Documentation.

[4]  K. Erickson,et al.  “This Video is Unavailable”: Analyzing Copyright Takedown of User-Generated Content on YouTube , 2018 .

[5]  Alistair Moffat,et al.  Some Observations on User Search Behaviour , 2006, Aust. J. Intell. Inf. Process. Syst..

[6]  Joan Calzada,et al.  What Do News Aggregators Do? Evidence from Google News in Spain and Germany , 2018, Mark. Sci..

[7]  Ming Cheng,et al.  Service online search ads: from a consumer journey view , 2017 .

[8]  Jeffrey D. Oldham,et al.  Brand Attitudes and Search Engine Queries , 2016 .

[9]  E. Frantziou Further Developments in the Right to be Forgotten: The European Court of Justice's Judgment in Case C-131/12, Google Spain, SL, Google Inc v Agencia Espanola de Proteccion de Datos , 2014 .

[10]  Brian Fitzgerald,et al.  Search Engine Liability for Copyright Infringement , 2008 .

[11]  Ravi Sen,et al.  Optimal Search Engine Marketing Strategy , 2005, Int. J. Electron. Commer..

[12]  Jennifer M. Urban,et al.  Efficient Process or Chilling Effects - Takedown Notices under Section 512 of the Digital Millennium Copyright Act , 2006 .

[13]  Ingmar Weber,et al.  An Analysis of Factors Used in Search Engine Ranking , 2005, AIRWeb.

[14]  Bernard J. Jansen,et al.  An examination of searcher's perceptions of nonsponsored and sponsored links during ecommerce Web searching , 2006, J. Assoc. Inf. Sci. Technol..

[15]  Maya F. Watters,et al.  Malicious Advertising and Music Piracy: A New Zealand Case Study , 2014, 2014 Fifth Cybercrime and Trustworthy Computing Conference.

[16]  J. Urist Who's Feeling Lucky? Skewed Incentives, Lack of Transparency, and Manipulation of Google Search Results Under the DMCA , 2006 .

[17]  Paul A. Watters Measuring Online Advertising Transparency in Singapore: An Investigation of Threats to Users , 2013 .

[19]  Paloma Martínez,et al.  Overlapping factors in search engine optimization and web accessibility , 2013, Online Inf. Rev..

[20]  Marián Boguñá,et al.  Approximating PageRank from In-Degree , 2007, WAW.

[21]  José-Antonio Ontalba-Ruipérez,et al.  Hit count estimate variability for website-specific queries in search engines: The case for rare disease association websites , 2018, Aslib J. Inf. Manag..

[22]  Miquel Peguera When the Cached Link is the Weakest Link: Search Engine Caches under the Digital Millennium Copyright Act , 2008 .

[23]  M. Dhital,et al.  Effect of rock weathering, clay mineralogy, and geological structures in the formation of large landslide, a case study from Dumre Besei landslide, Lesser Himalaya Nepal , 2012, Landslides.

[24]  Artur Strzelecki IP Address and Autonomous System Diversification as an Important Factor for Building Google Ranking , 2017 .

[25]  Michael P. Evans,et al.  Analysing Google rankings through search engine optimization data , 2007, Internet Res..

[26]  Judit Bar-Ilan,et al.  Testing the stability of "wisdom of crowds" judgments of search results over time and their similarity with the search engine rankings , 2016, Aslib J. Inf. Manag..

[27]  Kenneth L. Kraemer,et al.  Determinants of E-Business Use in U.S. Firms , 2006, Int. J. Electron. Commer..

[28]  John Street,et al.  The Impact on Cultural Diversity of Automated Anti-Piracy Systems As Copyright Enforcement Mechanisms: An Empirical Study of YouTube’s Content ID Digital Fingerprinting Technology , 2017 .

[29]  Dirk Lewandowski,et al.  An empirical investigation on search engine ad disclosure , 2018, J. Assoc. Inf. Sci. Technol..

[30]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[31]  Paul A. Watters A Systematic Approach to Measuring Advertising Transparency Online: An Australian Case Study , 2014, AWC.

[32]  Brandon Brown Fortifying the Safe Harbors: Reevaluating the DMCA in a Web 2.0 World , 2008 .

[33]  Bernard J. Jansen,et al.  Conversing and searching: the causal relationship between social media and web search , 2017, Internet Res..