Human resources for Big Data professions: A systematic classification of job roles and required skill sets

Abstract The rapid expansion of Big Data Analytics is forcing companies to rethink their Human Resource (HR) needs. However, at the same time, it is unclear which types of job roles and skills constitute this area. To this end, this study pursues to drive clarity across the heterogeneous nature of skills required in Big Data professions, by analyzing a large amount of real-world job posts published online. More precisely we: 1) identify four Big Data ‘job families’; 2) recognize nine homogeneous groups of Big Data skills (skill sets) that are being demanded by companies; 3) characterize each job family with the appropriate level of competence required within each Big Data skill set. We propose a novel, semi-automated, fully replicable, analytical methodology based on a combination of machine learning algorithms and expert judgement. Our analysis leverages a significant amount of online job posts, obtained through web scraping, to generate an intelligible classification of job roles and skill sets. The results can support business leaders and HR managers in establishing clear strategies for the acquisition and the development of the right skills needed to leverage Big Data at best. Moreover, the structured classification of job families and skill sets will help establish a common dictionary to be used by HR recruiters and education providers, so that supply and demand can more effectively meet in the job marketplace.

[1]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[2]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[3]  Paulo Cortez,et al.  Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation , 2015, Expert Syst. Appl..

[4]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[5]  CortezPaulo,et al.  Business intelligence in banking , 2015 .

[6]  Paolo Tonella,et al.  Web crawlers compared , 2006, Int. J. Web Inf. Syst..

[7]  Jennifer E. Rowley,et al.  The wisdom hierarchy: representations of the DIKW hierarchy , 2007, J. Inf. Sci..

[8]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[9]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[10]  Eloisa Vargiu,et al.  Exploiting web scraping in a collaborative filtering- based approach to web advertising , 2012, Artif. Intell. Res..

[11]  Maximilian Röglinger,et al.  Big Data , 2013, Bus. Inf. Syst. Eng..

[12]  Tom Fawcett,et al.  Data Science and its Relationship to Big Data and Data-Driven Decision Making , 2013, Big Data.

[13]  Longbing Cao,et al.  Coupling learning of complex interactions , 2015, Inf. Process. Manag..

[14]  G. Moore Cramming more components onto integrated circuits, Reprinted from Electronics, volume 38, number 8, April 19, 1965, pp.114 ff. , 2006, IEEE Solid-State Circuits Newsletter.

[15]  Andrea De Mauro,et al.  A formal definition of Big Data based on its essential features , 2016 .

[16]  Adam Barker,et al.  Undefined By Data: A Survey of Big Data Definitions , 2013, ArXiv.

[17]  Kurt Hornik,et al.  Text Mining Infrastructure in R , 2008 .

[18]  Sachchidanand Singh,et al.  Big Data analytics , 2012 .

[19]  Jari Porras,et al.  Perspectives to Definition of Big Data: A Mapping Study and Discussion , 2016 .

[20]  Thomas H. Davenport,et al.  Big Data at Work: Dispelling the Myths, Uncovering the Opportunities , 2014 .

[21]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[22]  Babita Gupta,et al.  The Current State of Business Intelligence in Academia: The Arrival of Big Data , 2014, CAIS.

[23]  Gustavo Morales-Alonso,et al.  Entrepreneurial intention of engineering students and associated influence of contextual factors / Intención emprendedora de los estudiantes de ingeniería e influencia de factores contextuales , 2016 .

[24]  Paavo Ritala,et al.  How much does firm-specific intellectual capital vary? Cross-industry and cross-national comparison , 2017 .

[25]  Keith W Kelley,et al.  NIH public access policy , 2008, Brain, Behavior, and Immunity.

[26]  Hua Yuan,et al.  Semantic Search for Public Opinions on Urban Affairs: A Probabilistic Topic Modeling-Based Approach , 2016, Inf. Process. Manag..

[27]  Alex Coad,et al.  Non-founder human capital and the long-run growth and survival of high-tech ventures , 2017 .

[28]  Steven M. Miller,et al.  Collaborative Approaches Needed to Close the Big Data Skills Gap , 2014 .

[29]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[30]  M. Hilbert,et al.  Big Data for Development: A Review of Promises and Challenges , 2016 .

[31]  David M. Blei,et al.  Introduction to Probabilistic Topic Models , 2010 .

[32]  Lorin M. Hitt,et al.  Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance? , 2011, ICIS 2011.

[33]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[34]  Jacques Bughin,et al.  Big data, Big bang? , 2016, Journal of Big Data.

[35]  Il-Yeol Song,et al.  Big data and data science: what should we teach? , 2016, Expert Syst. J. Knowl. Eng..

[36]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[37]  Tugba Özacar A tool for producing structured interoperable data from product features on the web , 2016, Inf. Syst..

[38]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[39]  Bernard Marr,et al.  Big Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance , 2015 .

[40]  G. Parra,et al.  Mayer Schönberger, Viktor; Cukier, Kenneth. Big Data: A Revolution That Will Transform How We Live, Work and Think. London: John Murray, 2013 , 2015 .

[41]  Andrea Capiluppi,et al.  Matching demand and offer in on-line provision: A longitudinal study of monster.com , 2010, 2010 12th IEEE International Symposium on Web Systems Evolution (WSE).

[42]  Aeilko H. Zwinderman,et al.  Understanding big data themes from scientific biomedical literature through topic modeling , 2016, Journal of Big Data.

[43]  Thomas J. Steenburgh,et al.  Motivating Salespeople: What Really Works , 2012, Harvard business review.

[44]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[45]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.

[46]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[47]  Miriam Delgado-Verde,et al.  Intellectual capital and radical innovation: Exploring the quadratic effects in technology-based manufacturing firms , 2016 .

[48]  Luca Grilli,et al.  On growth drivers of high-tech start-ups: Exploring the role of founders' human capital and venture capital , 2010 .

[49]  T. Davenport,et al.  Data scientist: the sexiest job of the 21st century. , 2012, Harvard business review.

[50]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..