Making geological sense of 'Big Data' in sedimentary provenance analysis

Abstract Sedimentary provenance studies increasingly apply multiple chemical, mineralogical and isotopic proxies to many samples. The resulting datasets are often so large (containing thousands of numerical values) and complex (comprising multiple dimensions) that it is warranted to use the Internet-era term ‘Big Data’ to describe them. This paper introduces Multidimensional Scaling (MDS), Generalised Procrustes Analysis (GPA) and Individual Differences Scaling (INDSCAL, a type of ‘3-way MDS’ algorithm) as simple yet powerful tools to extract geological insights from ‘Big Data’ in a provenance context. Using a dataset from the Namib Sand Sea as a test case, we show how MDS can be used to visualise the similarities and differences between 16 fluvial and aeolian sand samples for five different provenance proxies, resulting in five different ‘configurations’. These configurations can be fed into a GPA algorithm, which translates, rotates, scales and reflects them to extract a ‘consensus view’ for all the data considered together. Alternatively, the five proxies can be jointly analysed by INDSCAL, which fits the data with not one but two sets of coordinates: the ‘group configuration’, which strongly resembles the graphical output produced by GPA, and the ‘source weights’, which can be used to attach geological meaning to the group configuration. For the Namib study, the three methods paint a detailed and self-consistent picture of a sediment routing system in which sand composition is determined by the combination of provenance and hydraulic sorting effects.

[1]  C. Helm MULTIDIMENSIONAL RATIO SCALING ANALYSIS OF PERCEIVED COLOR RELATIONS. , 1964, Journal of the Optical Society of America.

[2]  Pieter Vermeesch,et al.  Provenance and recycling of Arabian desert sand , 2013 .

[3]  L. Kump,et al.  Response of nannoplankton to early Eocene ocean destratification , 2011 .

[4]  Kenneth Pye,et al.  Geological and Soil Evidence: Forensic Applications , 2007 .

[5]  P. Vermeesch,et al.  Sand residence times of one million years in the Namib Sand Sea from cosmogenic nuclides , 2010 .

[6]  P. Vermeesch,et al.  Ultra-long distance littoral transport of Orange sand and provenance of the Skeleton Coast Erg (Namibia) , 2014 .

[7]  László Orlóci,et al.  Applying Metric and Nonmetric Multidimensional Scaling to Ecological Studies: Some New Results , 1986 .

[8]  P. Groenen,et al.  Applied Multidimensional Scaling , 2012 .

[9]  Patrick Mair,et al.  Multidimensional Scaling Using Majorization: SMACOF in R , 2008 .

[10]  Pieter Vermeesch,et al.  Genetic linkage between the Yellow River, the Mu Us desert and the Chinese Loess Plateau , 2013 .

[11]  W. Feller On the Kolmogorov–Smirnov Limit Theorems for Empirical Distributions , 1948 .

[12]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[13]  P. Vermeesch,et al.  Physical controls on sand composition and relative durability of detrital minerals during ultra‐long distance littoral and aeolian transport (Namibia and southern Angola) , 2015 .

[14]  P. Vermeesch,et al.  Petrology of the Namib Sand Sea: Long-distance transport and compositional variability in the wind-displaced Orange Delta , 2012 .

[15]  Willem J. Heiser,et al.  PROXSCAL: A Multidimensional Scaling Program for Individual Differences Scaling with Constraints , 2014 .

[16]  M. Hazelton,et al.  Comparison of detrital zircon age distributions by kernel functional estimation , 2004 .

[17]  Jessica L. Allen,et al.  New technology and methodology for assessing sandstone composition: A preliminary case study using a quantitative electron microscope scanner (QEMScan) , 2012 .

[18]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[19]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[20]  A. Gerdes,et al.  Precise and accurate in situ U-Pb dating of zircon with high sample throughput by automated LA-SF-ICP-MS , 2009 .

[21]  Pieter Vermeesch,et al.  Multi-sample comparison of detrital age distributions (vol 191, pg 209, 2002) , 2014 .

[22]  H. Smyth,et al.  Sediment Provenance Studies in Hydrocarbon Exploration and Production , 2014 .

[23]  D. Stockli,et al.  Provenance of the upper Miocene-Pliocene Red Clay deposits of the Chinese loess plateau , 2013 .

[24]  F. Rohlf Paleontological Data Analysis , 2007 .

[25]  J. Gower Generalized procrustes analysis , 1975 .

[26]  P. Vermeesch On the visualisation of detrital age distributions , 2012 .

[27]  P. Arabie,et al.  Three-Way Scaling and Clustering. , 1991 .

[28]  J. Aitchison Principal component analysis of compositional data , 1983 .

[29]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[30]  J. Leeuw Multidimensional Scaling Using Majorization : SMACOF in , 2008 .

[31]  S. Andò,et al.  Settling equivalence of detrital minerals and grain-size dependence of sediment composition , 2008 .

[32]  E. Garzanti,et al.  Grain-size dependence of sediment composition and environmental bias in provenance studies , 2009 .

[33]  P. Vermeesch Multi-sample comparison of detrital age distributions , 2013 .

[34]  Bridget S. Wade,et al.  Major shifts in calcareous phytoplankton assemblages through the Eocene‐Oligocene transition of Tanzania and their implications for low‐latitude primary production , 2008 .

[35]  E. Garzanti,et al.  Corrosion of heavy minerals during weathering and diagenesis: A catalog for optical analysis , 2012 .

[36]  Nickolay T. Trendafilov,et al.  DINDSCAL: direct INDSCAL , 2012, Stat. Comput..