Big Data and Their Epistemological Challenge

It is estimated that humanity accumulated 180 EB of data between the invention of writing and 2006. Between 2006 and 2011, the total grew ten times and reached 1,600 EB. This figure is now expected to grow fourfold approximately every 3 years. Every day, enough new data are being generated to fill all US libraries eight times over. As a result, there is much talk about “big data”. This special issue on “Evolution, Genetic Engineering and Human Enhancement”, for example, would have been inconceivable in an age of “small data”, simply because genetics is one of the datagreediest sciences around. This is why, in the USA, the National Institutes of Health (NIH) and the National Science Foundation (NSF) have identified big data as a programme focus. One of the main NSF–NIH interagency initiatives addresses the need for core techniques and technologies for advancing big data science and engineering (see NSF-12-499). Despite the importance of the phenomenon, it is unclear what exactly the term “big data” means and hence refers to. The aforementioned document specifies that: “The phrase ‘big data’ in this solicitation refers to large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.” You do not need to be an analytic philosopher to find this both obscure and vague. Wikipedia, for once, is also unhelpful. Not because the relevant entry is unreliable, but because it reports the common definition, which is unsatisfactory: “data sets so large and complex that they become awkward to work with using onhand database management tools”. Apart from the circular problem of defining “big” with “large”, the definition suggests that data are too big or large only in relation to our current computational power. This is misleading. Of course, “big”, as many other terms, is a relational predicate: a pair of shoes is too big for you, but fine for me. It is also trivial to acknowledge that we tend to evaluate things non-relationally, in this case as absolutely big, whenever the frame of reference is obvious enough to be left Philos. Technol. (2012) 25:435–437 DOI 10.1007/s13347-012-0093-4