EMR: Scalable Clustering of Big HR Data using Evolutionary MapReduce

Nowadays, the volume and variety of generated data, how to process it and accordingly create value through scalable analytics are main challenges to industries and real-world practices such as talent analytics. For instance, large enterprises and job centres have to progress data intensive matching of job seekers to various job positions at the same time. In other words, it should result in the large scale assignment of best-fit (right) talents (Person) with right expertise (Profession) to the right job (Position) at the right time (Period). We call this definition as a 4P rule in this paper. All enterprises should consider 4P rule in their daily recruitment processes towards efficient workforce development strategies. Such consideration demands integrating large volumes of disparate data from various sources and strongly needs the use of scalable algorithms and analytics. The diversity of the data in human resource management requires speeding up analytical processes. The main challenge here is not only how and where to store the data, but also the analysing it towards creating value (knowledge discovery). In this paper, we propose a generic Career Knowledge Representation (CKR) model in order to be able to model most competences that exist in a wide variety of careers. A regenerated job qualification data of 15 million employees with 84 dimensions (competences) from real HRM data has been used in test and evaluation of proposed Evolutionary MapReduce K-Means method in this research. This proposed EMR method shows faster and more accurate experimental results in comparison to similar approaches and has been tested with real large scale datasets and achieved results are already discussed.

[1]  Abolfazl Toroghi Haghighat,et al.  Adaptive Resource Management and Provisioning in the Cloud Computing: A Survey of Definitions, Standards and Research Roadmaps , 2017, KSII Trans. Internet Inf. Syst..

[2]  Lefteris Angelis,et al.  An Adaptive Model for Competences Assessment of IT Professionals , 2015 .

[3]  Murilo Coelho Naldi,et al.  Multiple Parallel MapReduce k-Means Clustering with Validation and Selection , 2014, 2014 Brazilian Conference on Intelligent Systems.

[4]  Ricardo J. G. B. Campello,et al.  Evolutionary algorithms for clustering gene-expression data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[5]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[6]  J. Hunter Cognitive ability, cognitive aptitudes, job knowledge, and job performance , 1986 .

[7]  Donald. Miner,et al.  MapReduce design patterns , 2012 .

[8]  Steffen Staab,et al.  Systematically Monitoring Social Media: The case of the German federal election 2017 , 2018 .

[9]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[10]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[11]  Abolfazl Toroghi Haghighat,et al.  Research challenges and prospective business impacts of cloud computing: A survey , 2013, 2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS).

[12]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[13]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[14]  Henry H. Liu,et al.  Software Performance and Scalability - A Quantitative Approach , 2009, Wiley series on quantitative software engineering.

[15]  Alice N. Outerbridge,et al.  Impact of job experience and ability on job knowledge, work sample performance, and supervisory ratings of job performance , 1986 .

[16]  Lefteris Angelis,et al.  Competence assessment as an expert system for human resource management: A mathematical approach , 2017, Expert Syst. Appl..

[17]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[18]  Massimo Carro,et al.  NoSQL Databases , 2014, ArXiv.

[19]  GhemawatSanjay,et al.  The Google file system , 2003 .

[20]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[21]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[22]  Laxmikant V. Kale,et al.  NAMD2: Greater Scalability for Parallel Molecular Dynamics , 1999 .

[23]  Andre B. Bondi,et al.  Characteristics of scalability and their impact on performance , 2000, WOSP '00.

[24]  Longbing Cao Data science , 2017, Commun. ACM.

[25]  ReedBenjamin,et al.  Building a high-level dataflow system on top of Map-Reduce , 2009, VLDB 2009.

[26]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[27]  Xin Li,et al.  Improving virtualization performance and scalability with advanced hardware accelerations , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[28]  Steffen Staab,et al.  Systematically Monitoring Social Media: The case of the German federal election 2017 , 2018, ArXiv.

[29]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[30]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[31]  Gilberto Viana de Oliveira,et al.  Scalable Fast Evolutionary k-Means Clustering , 2015, BRACIS.

[32]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[33]  Thomas Bäck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[34]  Xin Yao,et al.  A new evolutionary system for evolving artificial neural networks , 1997, IEEE Trans. Neural Networks.

[35]  Ioannis Stamelos,et al.  ComProFITS: A web-based platform for human resources competence assessment , 2015, 2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA).

[36]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[37]  Lefteris Angelis,et al.  The COMALAT Approach to Individualized E-Learning in Job-Specific Language Competences , 2018 .

[38]  Madjid Fathi,et al.  Knowledge integration of collaborative product design using cloud computing infrastructure , 2011, 2011 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY.

[39]  Madjid Fathi Integration of Practice-Oriented Knowledge Technology: Trends and Prospectives , 2013 .

[40]  P. O'Connell,et al.  The Skill Matching Challenge: Analysing Skill Mismatch and Policy implications , 2010 .

[41]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[42]  Lefteris Angelis,et al.  Towards analytical evaluation of professional competences in Human Resource Management , 2013, IECON 2013 - 39th Annual Conference of the IEEE Industrial Electronics Society.

[43]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[44]  M. Beier,et al.  It's Time To Examine the Nomological Net of Job Knowledge , 2016, Industrial and Organizational Psychology.

[45]  Sean Owen,et al.  Mahout in Action , 2011 .