Taxonomy for Humans or Computers? Cognitive Pragmatics for Big Data

Criticism of big data has focused on showing that more is not necessarily better, in the sense that data may lose their value when taken out of context and aggregated together. The next step is to incorporate an awareness of pitfalls for aggregation into the design of data infrastructure and institutions. A common strategy minimizes aggregation errors by increasing the precision of our conventions for identifying and classifying data. As a counterpoint, we argue that there are pragmatic trade-offs between precision and ambiguity that are key to designing effective solutions for generating big data about biodiversity. We focus on the importance of theory-dependence as a source of ambiguity in taxonomic nomenclature and hence a persistent challenge for implementing a single, long-term solution to storing and accessing meaningful sets of biological specimens. We argue that ambiguity does have a positive role to play in scientific progress as a tool for efficiently symbolizing multiple aspects of taxa and mediating between conflicting hypotheses about their nature. Pursuing a deeper understanding of the trade-offs and synthesis of precision and ambiguity as virtues of scientific language and communication systems then offers a productive next step for realizing sound, big biodiversity data services.

[1]  Geoffrey C. Bowker,et al.  Making an Issue out of a Standard , 2013 .

[2]  David Sepkoski Rereading the Fossil Record: The Growth of Paleobiology as an Evolutionary Discipline , 2012 .

[3]  Noam Chomsky,et al.  On Nature and Language: An interview on minimalism , 2002 .

[4]  Anthony J. G. Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery [Point of View] , 2011 .

[5]  H. Godfray Challenges for taxonomy , 2002, Nature.

[6]  Staffan Müller-Wille,et al.  Natural history and information overload: The case of Linnaeus , 2012, Studies in history and philosophy of biological and biomedical sciences.

[7]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[8]  David Remsen,et al.  The use and limits of scientific names in biological informatics , 2016, ZooKeys.

[9]  A. J. Cain,et al.  LOGIC AND MEMORY IN LINNAEUS'S SYSTEM OF TAXONOMY. , 1958 .

[10]  Carl Lagoze,et al.  Big Data, data integrity, and the fracturing of the control zone , 2014, Big Data Soc..

[11]  S. Levinson Presumptive Meanings: The theory of generalized conversational implicature , 2001 .

[12]  Dan Suciu Big Data Begets Big Database Theory , 2013, BNCOD.

[13]  M. Watson,et al.  The Prometheus Taxonomic Model: a practical approach to representing multiple classifications. , 2000 .

[14]  H. Hutter,et al.  Big Data in Caenorhabditis elegans: quo vadis? , 2015, Molecular biology of the cell.

[15]  Robert Lücking,et al.  From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories , 2016, PloS one.

[16]  H. L. Blomquist The grasses of North Carolina , 1949 .

[17]  Anne Thessen,et al.  Challenges with using names to link digital biodiversity information , 2016, Biodiversity data journal.

[18]  Bertram Ludäscher,et al.  Two Influential Primate Classifications Logically Aligned , 2016, Systematic biology.

[19]  Bertram Ludäscher,et al.  Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building , 2016, BMC Bioinformatics.

[20]  Christine L Borgman,et al.  Science friction: Data, metadata, and collaboration , 2011, Social studies of science.

[21]  Geoffrey C. Bowker Biodiversity Datadiversity , 2000 .

[22]  Nico M. Franz,et al.  Controlling the taxonomic variable: Taxonomic concept resolution for a southeastern United States herbarium portal , 2016 .

[23]  Gaurav Vaidya,et al.  Avibase – a database system for managing and organizing taxonomic concepts , 2014, ZooKeys.

[24]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[25]  Xiao-Li Meng,et al.  A Trio of Inference Problems That Could Win You a Nobel Prize in Statistics (If You Help Fund It) , 2014 .

[26]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[27]  W. T. Stearn,et al.  The Background of Linnaeus's Contributions to the Nomenclature and Methods of Systematic Biology , 1959 .

[28]  Anke Schmid,et al.  Continual Permutations Of Action , 2016 .

[29]  Steven T. Piantadosi,et al.  The communicative function of ambiguity in language , 2011, Cognition.

[30]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[31]  Maureen A. O’Malley,et al.  When integration fails: Prokaryote phylogeny and the tree of life. , 2013, Studies in history and philosophy of biological and biomedical sciences.

[32]  Roderic D. M. Page Surfacing the deep data of taxonomy , 2016, ZooKeys.

[33]  J. Griesemer,et al.  Transforming Objects into Data: How Minute Technicalities of Recording “Species Location” Entrench a Basic Challenge for Biodiversity , 2011 .

[34]  Sabina Leonelli,et al.  What difference does quantity make? On the epistemology of Big Data in biology , 2014, Big Data Soc..

[35]  Elihu M. Gerson,et al.  Reach, Bracket, and the Limits of Rationalized Coordination: Some Challenges for CSCW , 2008, Theory in CSCW.

[36]  Bertram Ludäscher,et al.  Reasoning over Taxonomic Change: Exploring Alignments for the Perelleschus Use Case , 2014, PloS one.

[37]  D. Harris,et al.  Widespread mistaken identity in tropical plant collections , 2015, Current Biology.

[38]  E. Aronova,et al.  Introduction: Historicizing Big Data , 2017, Osiris.

[39]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[40]  J. Witteveen,et al.  Naming and contingency: the type method of biological taxonomy , 2015 .

[41]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[42]  Nico M. Franz,et al.  Phylogenetic revision of Minyomerus Horn, 1876 sec. Jansen & Franz, 2015 (Coleoptera, Curculionidae) using taxonomic concept annotations and alignments , 2015, ZooKeys.

[43]  James R. Griesemer,et al.  Formalization and the Meaning of “Theory” in the Inexact Biological Sciences , 2013 .

[44]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[45]  R. Peet,et al.  Perspectives: Towards a language for mapping relationships among taxonomic concepts , 2009 .

[46]  B. Strasser The Experimenter's Museum: GenBank, Natural History, and the Moral Economies of Biomedicine , 2011, Isis.

[47]  A. S. Hitchcock Manual of the grasses of the United States , 1935 .

[48]  A. Townsend Peterson,et al.  Alternate Species Concepts as Bases for Determining Priority Conservation Areas , 1999 .

[49]  S. Müller-Wille,et al.  Carl Linnaeus's botanical paper slips (1767–1773) , 2014, Intellectual history review.

[50]  Götz Hoeppe,et al.  Working data together: The accountability and reflexivity of digital astronomical practice , 2014, Social studies of science.

[51]  E. Trucchi,et al.  Testing Classical Species Properties with Contemporary Data: How "Bad Species" in the Brassy Ringlets (Erebia tyndarus complex, Lepidoptera) Turned Good. , 2016, Systematic biology.

[52]  J. Griesemer,et al.  There and Back Again, or the Problem of Locality in Biodiversity Surveys* , 2009, Philosophy of Science.

[53]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[54]  Sabina Leonelli,et al.  Data-Centric Biology: A Philosophical Study , 2016 .

[55]  Jordan Gumm,et al.  Synthesizer: Expediting synthesis studies from context-free data with natural language processing , 2016 .

[56]  Nala Rogers Museum drawers go digital. , 2016, Science.

[57]  J. G. Burleigh,et al.  Synthesis of phylogeny and taxonomy into a comprehensive tree of life , 2014, Proceedings of the National Academy of Sciences.

[58]  D. Sperber Are folk taxonomies “memes”? , 1998, Behavioral and Brain Sciences.

[59]  Q. Wheeler The New Taxonomy , 2008 .

[60]  J. Jansonius LINNAEAN NOMENCLATURE-UNIVERSAL LANGUAGE OF TAXONOMISTS-AND THE SPORAE DISPERSAE (WITH A COMMENTARY ON HUGHES' PROPOSAL) , 1981 .

[61]  Walter G. Berendsohn,et al.  The concept of "potential taxa" in databases , 1995 .

[62]  Reinhard Riedl,et al.  Meaning and Relevance , 2001, Cognitive Technology.

[63]  Bertram Ludäscher,et al.  Names are not good enough: Reasoning over taxonomic change in the Andropogon complex , 2016, Semantic Web.

[64]  Nico M. Franz,et al.  5 On the Use of Taxonomic Concepts in Support of Biodiversity Research and Taxonomy , 2006 .

[65]  B. Ogilvie,et al.  The Many Books of Nature: Renaissance Naturalists and Information Overload , 2003, Journal of the history of ideas.

[66]  Cedric Raguenaud,et al.  The Prometheus Description Model: an examination of the taxonomic description-building process and its representation , 2005 .

[67]  Albert E. Radford,et al.  Manual of the Vascular Flora of the Carolinas , 1970 .

[68]  P. Stevens Why do we name organisms? Some reminders from the past , 2002 .

[69]  M. Rosenberg Contextual Cross-Referencing of Species Names for Fiddler Crabs (Genus Uca): An Experiment in Cyber-Taxonomy , 2014, PloS one.

[70]  Contribution and Co-production: The Collaborative Culture of Linnaean Botany , 2012 .

[71]  Nico M. Franz,et al.  BIOLOGICAL TAXONOMY AND ONTOLOGY DEVELOPMENT: SCOPE AND LIMITATIONS , 2010 .

[72]  J. Witteveen Suppressing Synonymy with a Homonym: The Emergence of the Nomenclatural Type Concept in Nineteenth Century Natural History , 2016, Journal of the history of biology.

[73]  Susan Leigh Star,et al.  Institutional Ecology, `Translations' and Boundary Objects: Amateurs and Professionals in Berkeley's Museum of Vertebrate Zoology, 1907-39 , 1989 .

[74]  B. Dayrat Celebrating 250 Dynamic Years of Nomenclatural Debates , 2010 .