A view on big data and its relation to Informetrics

Purpose: big data offer a huge challenge. Their very existence leads to the contradiction that the more data we have the less accessible they become, as the particular piece of information one is searching for may be buried among terabytes of other data. In this contribution we discuss the origin of big data and point to three challenges when big data arise: Data storage, data processing and generating insights. Design/methodology/approach: Computer-related challenges can be expressed by the CAP theorem which states that it is only possible to simultaneously provide any two of the three following properties in distributed applications: Consistency (C), availability (A) and partition tolerance (P). As an aside we mention Amdahl’s law and its application for scientific collaboration. We further discuss data mining in large databases and knowledge representation for handling the results of data mining exercises. We further offer a short informetric study of the field of big data , and point to the ethical dimension of the big data phenomenon. Findings: There still are serious problems to overcome before the field of big data can deliver on its promises. Implications and limitations: This contribution offers a personal view, focusing on the information science aspects, but much more can be said about software aspects. Originality/value: We express the hope that the information scientists, including librarians, will be able to play their full role within the knowledge discovery, data mining and big data communities, leading to exciting developments, the reduction of scientific bottlenecks and really innovative applications.

[1]  Moe Key,et al.  Big Data Analysis—Competition and Symbiosis of RDBMS and MapReduce , 2012 .

[2]  Peter Szolovits,et al.  What Is a Knowledge Representation? , 1993, AI Mag..

[3]  Fredrik Åström,et al.  Visualizing Library and Information Science concept spaces through keyword and citation based maps and clusters , 2002 .

[4]  E. Birney The making of ENCODE: Lessons for big-data projects , 2012, Nature.

[5]  Yihong Gong,et al.  Knowledge Discovery from Citation Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[6]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[7]  Susan Gauch,et al.  Intelligent information agents: review and challenges for distributed information sources , 1998 .

[8]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[9]  Isabelle Bichindaritz,et al.  Concept mining for indexing medical literature , 2005, Eng. Appl. Artif. Intell..

[10]  Evangelos Simoudis,et al.  Mining business databases , 1996, CACM.

[11]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[12]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[13]  Ismael Rafols,et al.  Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience , 2009, Scientometrics.

[14]  Xiaoli Zhang,et al.  Information-seeking patterns and behaviors of selected undergraduate students in a Chinese university , 1992 .

[15]  Sun Yi,et al.  Design and Implementation of Library Intelligent IM Reference Robot , 2011 .

[16]  Jean Tague-Sutcliffe,et al.  An Introduction to Informetrics , 1992, Inf. Process. Manag..

[17]  Ronald Rousseau,et al.  Interestingness and the essence of citation , 2013, J. Documentation.

[18]  T Reichhardt,et al.  It's sink or swim as a tidal wave of data approaches , 1999, Nature.

[19]  Peter Ingwersen,et al.  Information seeking research needs extension toward tasks and technology , 2004, Inf. Res..

[20]  Abe Crystal,et al.  Task analysis and human-computer interaction: approaches, techniques, and levels of analysis , 2004, AMCIS.

[21]  Stephen S. Murray,et al.  The bibliometric properties of article readership information , 2005, J. Assoc. Inf. Sci. Technol..

[22]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[23]  Martin Hilbert,et al.  The World’s Technological Capacity to Store, Communicate, and Compute Information , 2011, Science.

[24]  Jau-Hsiung Huang,et al.  On Parallel Processing Systems: Amdahl's Law Generalized and Some Results on Optimal Design , 1992, IEEE Trans. Software Eng..

[25]  G. Naik Scientists' Elusive Goal: Reproducing Study Results , 2011 .

[26]  Andrew M. Cox Flickr: a case study of Web2.0 , 2008, Aslib Proc..