A formal definition of Big Data based on its essential features

Purpose – The purpose of this paper is to identify and describe the most prominent research areas connected with “Big Data” and propose a thorough definition of the term. Design/methodology/approach – The authors have analysed a conspicuous corpus of industry and academia articles linked with Big Data to find commonalities among the topics they treated. The authors have also compiled a survey of existing definitions with a view of generating a more solid one that encompasses most of the work happening in the field. Findings – The main themes of Big Data are: information, technology, methods and impact. The authors propose a new definition for the term that reads as follows: “Big Data is the Information asset characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value.” Practical implications – The formal definition that is proposed can enable a more coherent development of the concept of Big Data, as it solely relies on ...

[1]  Ben Shneiderman,et al.  Extreme visualization: squeezing a billion records into a million pixels , 2008, SIGMOD Conference.

[2]  Mary Czerwinski,et al.  Interactions with big data analytics , 2012, INTR.

[3]  Jennifer E. Rowley,et al.  The wisdom hierarchy: representations of the DIKW hierarchy , 2007, J. Inf. Sci..

[4]  Heiko Schuldt,et al.  The Delos digital library reference model : foundations for digital libraries , 2007 .

[5]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[6]  Martin Hilbert,et al.  The World’s Technological Capacity to Store, Communicate, and Compute Information , 2011, Science.

[7]  Gary Grider,et al.  “At scale” author name matching with Hadoop/MapReduce , 2012 .

[8]  Arkady B. Zaslavsky,et al.  Sensing as a Service and Big Data , 2013, ArXiv.

[9]  Gaurav S. Sukhatme,et al.  Connecting the Physical World with Pervasive Networks , 2002, IEEE Pervasive Comput..

[10]  Guillermo Armando Ronda-Pupo,et al.  Dynamics of the evolution of the strategy concept 1962–2008: a co-word analysis , 2012 .

[11]  Maximilian Röglinger,et al.  Big Data , 2013, Bus. Inf. Syst. Eng..

[12]  Giselle C. Guzman,et al.  Internet Search Behavior as an Economic Forecasting Tool: The Case of Inflation Expectations , 2011 .

[13]  Martin Wattenberg,et al.  ManyEyes: a Site for Visualization at Internet Scale , 2007, IEEE Transactions on Visualization and Computer Graphics.

[14]  Fan Zhang,et al.  A characterization of big data benchmarks , 2013, 2013 IEEE International Conference on Big Data.

[15]  Andrew Prescott Bibliographic records as humanities big data , 2013, 2013 IEEE International Conference on Big Data.

[16]  L. Manovich,et al.  Trending: The Promises and the Challenges of Big Social Data , 2012 .

[17]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[18]  Andrea De Mauro,et al.  What is big data? A consensual definition and a review of key research topics , 2015, AIP Conference Proceedings.

[19]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.

[20]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[21]  Shan Suthaharan,et al.  Big data classification: problems and challenges in network intrusion prediction with machine learning , 2014, PERV.

[22]  Klaus F. Zimmermann,et al.  Google Econometrics and Unemployment Forecasting , 2009 .

[23]  Thomas J. Steenburgh,et al.  Motivating Salespeople: What Really Works , 2012, Harvard business review.

[24]  Erez Lieberman Aiden,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010, Science.

[25]  Matthias Hemmje,et al.  e-Infrastructures for Digital Libraries...the Future , 2013, TPDL.

[26]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[27]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[28]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[29]  Francesca Michelino,et al.  Internet and supply chain management: adoption modalities for Italian firms , 2008 .

[30]  Michele Grimaldi,et al.  A dynamic view of knowledge and information: a stock and flow based methodology , 2008 .

[31]  Paolo Manghi,et al.  Fourth workshop on very large digital libraries: on the marriage between very large digital libraries and very large data archives , 2012, SGMD.

[32]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[33]  Adam Barker,et al.  Undefined By Data: A Survey of Big Data Definitions , 2013, ArXiv.

[34]  Karen Coyle,et al.  Mass Digitization of Books. , 2006 .

[35]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[36]  Edd Dumbill,et al.  Making Sense of Big Data , 2013, Big Data.

[37]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .