Big Data Techniques for Applied Geoscience: Compute and Communicate

Big Data techniques have the potential to be paradigm-changing for applied geoscience if they are used widely. A significant number of such techniques, under the umbrella of Earth informatics, involve Machine Learning applied to high dimensional data to create new forms of value. This contribution presents two case studies of successful Earth informatics computation and the communication of the value of results, which provide insight into the uptake of ‘Big Data’ in geosciences. Machine Learning techniques split naturally into either supervised or unsupervised approaches. Supervised algorithms, such as Random Forests™ (RF), support vector machines or neural networks, share the concept of training a classifier using an initial (training) dataset. They are generally applied to predictive tasks, such as our first case study, predicting lithology from remote sensing and airborne geophysical data. Unsupervised algorithms, such as Self-Organising Maps (SOM), allow patterns inherent in the data to emerge without the use of a training dataset. They are generally applied to tasks which seek to explore patterns in data, such as our second case study, which identifies new potentially prospective river catchments. We find that calculating and presenting explicitly the newly extracted value, of the result obtained through computation, is an essential component of the post-compute evaluation. As strong advocates for the use of a range of Big Data techniques in applied geosciences, we conclude that the benefits to be gained from the way that we ‘compute’ can be lost if we do not also take considerable care with the ways that we ‘communicate’.

[1]  M. Cracknell,et al.  Mapping geology and volcanic-hosted massive sulfide alteration in the Hellyer–Mt Charter region, Tasmania, using Random Forests™ and Self-Organising Maps , 2014 .

[2]  Stephen Kuhn,et al.  Lithological mapping via Random Forests: Information Entropy as a proxy for inaccuracy , 2016 .

[3]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[4]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[5]  M. Cracknell,et al.  Multiple influences on regolith characteristics from continental-scale geophysical and mineralogical remote sensing data using Self-Organizing Maps , 2015 .

[6]  M. Cracknell,et al.  Combining Machine Learning and Geophysical Inversion for Applied Geophysics , 2015 .

[7]  Lutgarde M. C. Buydens,et al.  Self- and Super-organizing Maps in R: The kohonen Package , 2007 .

[8]  J. Robertson,et al.  Multiscale hierarchical domaining and compression of drill hole data , 2015, Comput. Geosci..

[9]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[10]  Saint John Walker Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2014 .

[11]  R. Müller,et al.  Prospectivity of Western Australian iron ore from geophysical data using a reject option classifier , 2015 .

[12]  Martin Mozina,et al.  Orange: data mining toolbox in python , 2013, J. Mach. Learn. Res..

[13]  Matthew J. Cracknell,et al.  Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information , 2014, Comput. Geosci..

[14]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[15]  Kerry Gallagher,et al.  Transdimensional change-point modeling as a tool to investigate uncertainty in applied geophysical inference: An example using borehole geophysical logs , 2013 .

[16]  Olli Simula,et al.  An approach to automated interpretation of SOM , 2001, WSOM.

[17]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .