Vesper: Visualising species archives

Abstract Vesper (Visual Exploration of SPEcies-referenced Repositories) is a tool that visualises Darwin Core Archive (DwC-A) datasets, and is aimed at reducing the amount of time and effort expended by biologists to ascertain the quality of data they are generating or using. Currently, DwC-A quality checking is limited to table outputs of data ‘existence’ and compliance with DwC-A format guidelines via the online DwC-A archive validator and reader. Whilst these tools thoroughly examine the presence of data, and the correctness of data structure against the DwC-A schema, they do not give any insight into the underlying quality of the data itself. Built on top of the D3 JavaScript library, Vesper analyses and displays DwC-A datasets in three fundamental dimensions—taxonomic, geographic and temporal—with a visualisation dedicated to each of these aspects of the data. By viewing a dataset's composition in these dimensions, a data consumer can judge whether it is suitable for the tasks or analyses they have in mind, whilst a data provider can identify where a dataset they've constructed may fall short in terms of data quality i.e. does it contain data that is obviously incorrect such as the classic longitude inversion that places North American specimens in China. A further visualisation of the taxonomic dimension can reveal the subtaxa distribution of reference taxonomies—whilst a simple table reveals the presence or not of certain data types for each record to give an overall data ‘existence’ profile for the dataset. Selections of parts of a dataset within one visualisation are linked to the other visualisation displays for that dataset, permitting the discovery of whether data quality issues are restricted to identifiable sub-portions of the dataset. Vesper can handle client-side data sets of a million entities within a browser by judicious use of data filtering, as many of the data types within individual records are not necessary to judge the geographic, temporal or taxonomic distribution and extent of a dataset. Thus, many of the more verbose fields in the file can simply be passed over during an initial data decompression stage. Furthermore it can provide limited name and structure matching of a dataset against DwC-A packaged reference taxonomies to indicate data quality relative to sources outside the archive. A selection of annotated example scenarios shows how Vesper can reveal data quality issues in DwC-A archives.

[1]  Arturo H. Ariño,et al.  Bridging the biodiversity data gaps: Recommendations to meet users’ data needs , 2013 .

[2]  Michael Spenke,et al.  Visualization of Trees as Highly Compressed Tables with InfoZoom , 2003 .

[3]  Keith Andrews,et al.  Information Slices: Visualising and Exploring Large Hierarchies using Cascading, Semi-Circular Discs , 1998 .

[4]  Stefan Steiniger,et al.  The 2012 free and open source GIS software map - A guide to facilitate research, development, and adoption , 2013, Comput. Environ. Urban Syst..

[5]  Concerning the Hollow Curve of Distribution , 1924, The American Naturalist.

[6]  D. Holdsworth,et al.  Historical GIS and Visualization , 2009 .

[7]  Heidrun Schumann,et al.  Space, time and visual analytics , 2010, Int. J. Geogr. Inf. Sci..

[8]  J. B. Kruskal,et al.  Icicle Plots: Better Displays for Hierarchical Clustering , 1983 .

[9]  Michael S. Horn,et al.  The DeepTree Exhibit: Visualizing the Tree of Life to Facilitate Informal Learning , 2012, IEEE Transactions on Visualization and Computer Graphics.

[10]  Matthew Jones,et al.  Maximizing the Value of Ecological Data with Structured Metadata: An Introduction to Ecological Metadata Language (EML) and Principles for Metadata Creation , 2005 .

[11]  Trevor Paterson,et al.  Scientific Names Are Ambiguous as Identifiers for Biological Taxa: Their Context and Definition Are Required for Accurate Data Integration , 2005, DILS.

[12]  E. Holman Evolutionary and psychological effects in pre-evolutionary classifications , 1985 .

[13]  Jin Young Hong,et al.  Zoomology: ComparingTwo Large Hierarchical Trees , 2003 .

[14]  Steve Kelling,et al.  BirdVis: Visualizing and Understanding Bird Populations , 2011, IEEE Transactions on Visualization and Computer Graphics.

[15]  Bongshin Lee,et al.  A comparative evaluation on tree visualization methods for hierarchical structures with large fan-outs , 2010, CHI.

[16]  P. Minnhagen,et al.  50 years of inordinate fondness. , 2014, Systematic biology.

[17]  Heidrun Schumann,et al.  The Design Space of Implicit Hierarchy Visualization: A Survey , 2011, IEEE Transactions on Visualization and Computer Graphics.

[18]  Martin Graham,et al.  Visual exploration of alternative taxonomies through concepts , 2007, Ecol. Informatics.

[19]  Heidrun Schumann,et al.  Visualizing time-oriented data - A systematic view , 2007, Comput. Graph..

[20]  Alan M. MacEachren,et al.  HerbariaViz: A web-based client-server interface for mapping and exploring flora observation data , 2011, Ecol. Informatics.

[21]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[22]  Simone Garlandini,et al.  Evaluating the Effectiveness and Efficiency of Visual Variables for Geographic Information Visualization , 2009, COSIT.

[23]  Nicholas Chen,et al.  TreeJuxtaposer : Scalable Tree Comparison using Focus + Context with Guaranteed Visibility , 2006 .

[24]  Martin Wattenberg,et al.  Stacked Graphs – Geometry & Aesthetics , 2008, IEEE Transactions on Visualization and Computer Graphics.

[25]  Martin Graham,et al.  Exploring Multiple Trees through DAG Representations , 2007, IEEE Transactions on Visualization and Computer Graphics.

[26]  Arturo H. Ariño,et al.  BIDDSAT: visualizing the content of biodiversity data publishers in the Global Biodiversity Information Facility network , 2012, Bioinform..

[27]  Arturo H. Ariño,et al.  Assessing the Primary Data Hosted by the Spanish Node of the Global Biodiversity Information Facility (GBIF) , 2013, PloS one.

[28]  Hans-Jörg Schulz,et al.  Treevis.net: A Tree Visualization Reference , 2011, IEEE Computer Graphics and Applications.

[29]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[30]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[31]  J.C. Roberts,et al.  State of the Art: Coordinated & Multiple Views in Exploratory Visualization , 2007, Fifth International Conference on Coordinated and Multiple Views in Exploratory Visualization (CMV 2007).

[32]  Patrick Weber,et al.  OpenStreetMap: User-Generated Street Maps , 2008, IEEE Pervasive Computing.

[33]  Arturo H. Ariño,et al.  Assessment of user needs of primary biodiversity data: Analysis, concerns, and challenges , 2013 .