In 2011 the term "Big Data" was introduced by Gartner [5], and since then its use in literature has ever increased, also in the (bio)medical research field [1]. Although the term Big Data is widely used, studies show that its meaning is much debated and many different definitions exist [10]. This variety of definitions may lead to different understandings and therefore difficulties in communication. For example, a researcher that is looking for "Big Data" solutions might miss an interesting method that is not tagged as such. In previous work we studied major topics that appear in Big Data literature using a Topic Modelling approach [8]. However, from that study it was not possible to know whether those topics are exclusive to publications self-identified as Big Data (BD), or not. Therefore, here we investigate the research question: What are the differences between topics in BD and non-Big Data (NBD) corpora?
[1]
Leo Breiman,et al.
Random Forests
,
2001,
Machine Learning.
[2]
Aeilko H. Zwinderman,et al.
Understanding big data themes from scientific biomedical literature through topic modeling
,
2016,
Journal of Big Data.
[3]
Valerio Persico,et al.
Big Data for Health
,
2019,
Encyclopedia of Big Data Technologies.
[4]
Andrea De Mauro,et al.
A formal definition of Big Data based on its essential features
,
2016
.
[5]
R Core Team,et al.
R: A language and environment for statistical computing.
,
2014
.
[6]
Adam Barker,et al.
Undefined By Data: A Survey of Big Data Definitions
,
2013,
ArXiv.
[7]
David M. Blei,et al.
Probabilistic topic models
,
2012,
Commun. ACM.