Classifying online Job Advertisements through Machine Learning

Abstract The rapid growth of Web usage for advertising job positions provides a great opportunity for real-time labour market monitoring. This is the aim of Labour Market Intelligence (LMI), a field that is becoming increasingly relevant to EU Labour Market policies design and evaluation. The analysis of Web job vacancies, indeed, represents a competitive advantage to labour market stakeholders with respect to classical survey-based analyses, as it allows for reducing the time-to-market of the analysis by moving towards a fact-based decision making model. In this paper, we present our approach for automatically classifying million Web job vacancies on a standard taxonomy of occupations. We show how this problem has been expressed in terms of text classification via machine learning. We also show how our approach has been applied to certain real-life projects and we discuss the benefits provided to end users.

[1]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[2]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[3]  Anders Haug,et al.  The costs of poor data quality , 2011 .

[4]  In Lee,et al.  Modeling the benefit of e-recruiting process integration , 2011, Decis. Support Syst..

[5]  Karthik Visweswariah,et al.  PROSPECT: a system for screening candidates for recruitment , 2010, CIKM.

[6]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[7]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 shared task , 2003 .

[8]  Arkaitz Zubiaga,et al.  Real‐time classification of Twitter trends , 2014, J. Assoc. Inf. Sci. Technol..

[9]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[10]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[12]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[13]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[14]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[15]  Gabriella Pasi,et al.  A language modelling approach for discovering novel labour market occupations from the web , 2017, WI.

[16]  Fabio Persia,et al.  Challenge: Processing web texts for classifying job offers , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[17]  Fabio Mercorio,et al.  Big Data Enables Labor Market Intelligence , 2019, Encyclopedia of Big Data Technologies.

[18]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[19]  Tamraparni Dasu,et al.  Data Glitches: Monsters in Your Data , 2013, Handbook of Data Quality.

[20]  Walter Daelemans,et al.  Information Extraction via Double Classification , 2003 .

[21]  Usman Qamar,et al.  TOM: Twitter opinion mining framework using hybrid classification scheme , 2014, Decis. Support Syst..

[22]  Kun Yu,et al.  Resume Information Extraction with Cascaded Hybrid Model , 2005, ACL.

[23]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[24]  James Allan,et al.  Matching resumes and jobs based on relevance models , 2007, SIGIR.

[25]  Michelangelo Ceci,et al.  Big Data Research in Italy: A Perspective , 2016 .

[26]  Marco Saerens,et al.  A Graph-Based Approach to Skill Extraction from Text , 2013, TextGraphs@EMNLP.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Wenxing Hong,et al.  Dynamic user profile-based job recommender system , 2013, 2013 8th International Conference on Computer Science & Education.

[29]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[30]  Roberto Boselli,et al.  A model-based evaluation of data quality activities in KDD , 2015, Inf. Process. Manag..

[31]  Roberto Boselli,et al.  Planning meets Data Cleansing , 2014, ICAPS.

[32]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.