Data learning from big data

Abstract Technology is generating a huge and growing availability of observations of diverse nature. This big data is placing data learning as a central scientific discipline. It includes collection, storage, preprocessing, visualization and, essentially, statistical analysis of enormous batches of data. In this paper, we discuss the role of statistics regarding some of the issues raised by big data in this new paradigm and also propose the name of data learning to describe all the activities that allow to obtain relevant knowledge from this new source of information.

[1]  D. Donoho 50 Years of Data Science , 2017 .

[2]  S. Ejaz Ahmed,et al.  Big and complex data analysis: methodologies and applications , 2017 .

[3]  Sharath Chandra Guntuku,et al.  Big Data Analytics framework for Peer-to-Peer Botnet detection using Random Forests , 2014, Inf. Sci..

[4]  Piercesare Secchi,et al.  On the role of statistics in the era of big data: A call for a debate , 2018 .

[5]  Katharine Armstrong,et al.  Big data: a revolution that will transform how we live, work, and think , 2014 .

[6]  Kwan-Liu Ma,et al.  Big-Data Visualization , 2013, IEEE Computer Graphics and Applications.

[7]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[8]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[9]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[10]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[11]  S. Geer,et al.  Statistics for big data: A perspective , 2018 .

[12]  Sergio Ramírez-Gallego,et al.  Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach , 2015 .

[13]  Verónica Bolón-Canedo,et al.  Recent advances and emerging challenges of feature selection in the context of big data , 2015, Knowl. Based Syst..

[14]  Ck Cheng,et al.  The Age of Big Data , 2015 .

[15]  Gad Abraham,et al.  Fast Principal Component Analysis of Large-Scale Genome-Wide Data , 2014, bioRxiv.

[16]  Nabil Chakfe,et al.  Big Data, a Big Mistake? , 2019, European journal of vascular and endovascular surgery : the official journal of the European Society for Vascular Surgery.

[17]  David B. Dunson,et al.  Statistics in the big data era: Failures of the machine , 2018 .

[18]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[19]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[20]  Michael R. Elliott,et al.  Inference for Nonprobability Samples , 2017 .

[21]  T. Harford,et al.  Big data: A big mistake? , 2014 .

[22]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[23]  Philippe Vieu,et al.  On dimension reduction models for functional data , 2018 .

[24]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[25]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[26]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[27]  J Steve Marron,et al.  Overview of object oriented data analysis , 2014, Biometrical journal. Biometrische Zeitschrift.

[28]  Aldo Goia,et al.  An introduction to recent advances in high/infinite dimensional statistics , 2016, J. Multivar. Anal..

[29]  C. Lynch Big data: How do your data grow? , 2008, Nature.

[30]  Jignesh M. Patel,et al.  Big data and its technical challenges , 2014, CACM.

[31]  Shirui Pan,et al.  Finding the best not the most: regularized loss minimization subgraph selection for graph classification , 2015, Pattern Recognit..