Web crawling ethics revisited: Cost, privacy, and denial of service

Ethical aspects of the employment of Web crawlers for information science research and other contexts are reviewed. The difference between legal and ethical uses of communications technologies is emphasized as well as the changing boundary between ethical and unethical conduct. A review of the potential impacts on Web site owners is used to underpin a new framework for ethical crawling, and it is argued that delicate human judgment is required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment. © 2006 Wiley Periodicals, Inc.

[1]  V. Weil,et al.  Research Ethics: Cases and Materials , 1995 .

[2]  Mike Thelwall,et al.  Motivations for academic web site interlinking: evidence for the Web as a novel source of information on informal scholarly communication , 2003, J. Inf. Sci..

[3]  Cristina Faba-Pérez,et al.  Data mining in a closed Web environment , 2003, Scientometrics.

[4]  P. Vardy,et al.  The Puzzle of Ethics , 1995 .

[5]  Abby Goodrum,et al.  Terrorism or civil disobedience: toward a hacktivist ethic , 2000, CSOC.

[6]  Paul Wouters,et al.  Formally citing the Web , 2004, J. Assoc. Inf. Sci. Technol..

[7]  Michael C. Loui,et al.  Taking the byte out of cookies: privacy, consent, and the Web , 1998, SIGCAS Comput. Soc..

[8]  R. Rogers,et al.  Information politics on the Web , 2004 .

[9]  Lucas D. Introna,et al.  The internet as a democratic medium: why the politics of search engines matters , 2000 .

[10]  Peter Williams,et al.  Evaluating metrics for comparing the use of web sites: a case study of two consumer health web sites , 2002, J. Inf. Sci..

[11]  Irene Wormell Informetrics and webometrics for measuring impact, visibility, and connectivity in science, politics, and business , 2001 .

[12]  Gaston Heimeriks,et al.  Mapping communication and collaboration in heterogeneous research networks , 2003, Scientometrics.

[13]  Jim Underwood Competitive Intelligence , 2002 .

[14]  Vincent J. Calluzzo,et al.  Ethics in Information Technology and Software Use , 2004 .

[15]  Andrew P. Carlin Disciplinary debates and bases of interdisciplinary studies , 2003 .

[16]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[17]  Ronald E. Anderson ACM code of ethics and professional conduct , 1992, CACM.

[18]  Steven G. Jones,et al.  Ethical Decision-Making and Internet Research: Recommendations from the AoIR Ethics Working Committee , 2004 .

[19]  Robert Alun Jones,et al.  The Ethics of Research in Cyberspace , 1994 .

[20]  William H. Shaw,et al.  Contemporary Ethics: Taking Account of Utilitarianism , 1999 .

[21]  Robert L. Arrington,et al.  Western Ethics: An Historical Introduction , 1998 .

[22]  Mike Thelwall,et al.  Scholarly Use of the Web: What Are the Key Inducers of Links to Journal Web Sites , 2003, J. Assoc. Inf. Sci. Technol..

[23]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[24]  Mike Thelwall,et al.  Search engine coverage bias: evidence and possible causes , 2004, Inf. Process. Manag..

[25]  Christen Krogh,et al.  The Rights of Agents , 1995, ATAL.

[26]  Bruce Schneier,et al.  Secrets and Lies: Digital Security in a Networked World , 2000 .

[27]  J. Reiman Driving to the Panopticon: A Philosophical Exploration of the Risks to Privacy Posed by the Highway Technology of the Future , 1995 .

[28]  Deborah G. Johnson Computer Ethics , 1985 .

[29]  Timothy D. Casey ISP Liability Survival Guide: Strategies for Managing Copyright, Spam, Cache, and Privacy Regulations , 2000 .

[30]  Peter Carey,et al.  Data Protection: A Practical Guide to UK and EU Law , 2004 .

[31]  Kirsten A. Foot,et al.  Analyzing Linking Practices: Candidate Sites in the 2002 US Electoral Web Sphere , 2006, J. Comput. Mediat. Commun..

[32]  X. Polanco,et al.  Using a compound approach based on elaborated neural network for Webometrics: An example issued from the EICSTES project , 2004, Scientometrics.

[33]  E. Williamson,et al.  Researchers and their 'subjects': Ethics, power, knowledge and consent , 2004 .

[34]  Denise N Rall Locating internet research methods within five qualitative research traditions , 2004 .

[35]  Liwen Vaughan Web hyperlinks reflect business performance: A study of US and Chinese IT Companies , 2004 .

[36]  David Eichmann,et al.  2 – Background : Agents in General and Spiders in Particular , 1994 .

[37]  Sudip Bhattacharjee,et al.  A Behavioral Model of Digital Music Piracy , 2004, J. Organ. Comput. Electron. Commer..

[38]  ManionMark,et al.  Terrorism or civil disobedience , 2000 .

[39]  Robert P. Colwell Trusting a Chaotic Future , 2004, Computer.

[40]  Wallace Koehler,et al.  Web page change and persistence - A four-year longitudinal study , 2002, J. Assoc. Inf. Sci. Technol..

[41]  Nicholas Negroponte,et al.  Being Digital , 1995 .

[42]  Michael C. Loui,et al.  Taking the byte out of cookies , 1998 .

[43]  Stuart Hall,et al.  Doing Cultural Studies: The Story of the Sony Walkman , 1997 .

[44]  Charles Oppenheim,et al.  Legal aspects of the web , 2005, Annu. Rev. Inf. Sci. Technol..

[45]  Chambers Chambers Concise Dictionary , 1992 .

[46]  Loet Leydesdorff,et al.  The university-industry knowledge relationship: Analyzing patents and the science base of technologies , 2004, J. Assoc. Inf. Sci. Technol..