Big Data in the physical sciences: challenges and opportunities

We provide a brief overview of challenges and opportunities in the physical sciences arising from data which is extreme in volume, acquisition rate, and heterogeneity. These require novel methodological approaches ranging from data management to analysis techniques to inference and modelling, affording close links with cutting-edge research in data science. We highlight past methodological breakthroughs with high impact, showcase selected current data science challenges in the physical sciences disciplines, and broadly discuss needs of the community in the era of Big Data. We argue that the recently founded Alan-Turing Institute (ATI) is ideally positioned to facilitate, and intensify, the mutually beneficial cross-fertilisation between core data science developments and their application in the physical sciences. Concrete measures are proposed to achieve these goals, critical for ensuring impact and continued research leadership in both fields.

[1]  P. R. Brook,et al.  Emission-rotation correlation in pulsars: new discoveries with optimal techniques , 2015, 1511.05481.

[2]  Stephen J. Roberts,et al.  A Sparse Gaussian Process Framework for Photometric Redshift Estimation , 2015, ArXiv.

[3]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[4]  R. Cousins,et al.  A Unified Approach to the Classical Statistical Analysis of Small Signals , 1997, physics/9711021.

[5]  Steven Reece,et al.  A Gaussian process framework for modelling stellar activity signals in radial velocity data , 2015, 1506.07304.

[6]  J. Högbom,et al.  APERTURE SYNTHESIS WITH A NON-REGULAR DISTRIBUTION OF INTERFEROMETER BASELINES. Commentary , 1974 .

[7]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[8]  Robert G. Clapp,et al.  Time‐lapse seismic noise correlation tomography at Valhall , 2014 .

[9]  J. T. Childers,et al.  Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC , 2012 .

[10]  Ben Stappers,et al.  Pulsar Science with the SKA , 2008 .

[11]  Atlas Collaboration,et al.  Search for the Standard Model Higgs boson produced in association with a vector boson and decaying to a b-quark pair with the ATLAS detector , 2012, 1207.0210.

[12]  O. Lahav,et al.  Massive lossless data compression and multiple parameter estimation from galaxy spectra , 1999, astro-ph/9911102.

[13]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[14]  Philip C. Gregory,et al.  Bayesian exoplanet tests of a new method for MCMC sampling in highly correlated model parameter spaces , 2011 .

[15]  H. L. Taylor,et al.  Deconvolution with the l 1 norm , 1979 .

[16]  S. Roberts,et al.  Precise time series photometry for the Kepler-2.0 mission , 2014, 1412.6304.

[17]  Pierre Baldi,et al.  Deep Learning, Dark Knowledge, and Dark Matter , 2014, HEPML@NIPS.

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Roel Snieder,et al.  Monitoring rapid temporal change in a volcano with coda wave interferometry , 2005 .

[20]  F. Tegenfeldt,et al.  TMVA , the Toolkit for Multivariate Data Analysis with ROOT , 2008 .

[21]  S. Aigrain,et al.  A Gaussian process framework for modelling instrumental systematics: application to transmission spectroscopy , 2011, 1109.3251.

[22]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[23]  Martin Landrø,et al.  Discrimination between pressure and fluid saturation changes from time-lapse seismic data , 2001 .

[24]  Workshop on High-energy Physics and Machine Learning, HEPML 2014, held at NIPS 2014, Montreal, Quebec, Canada, December 8-13, 2014 , 2014, HEPML@NIPS.

[25]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[26]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[27]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[28]  S. Mallat A wavelet tour of signal processing , 1998 .

[29]  J. Hammersley,et al.  Monte Carlo Methods , 1965 .