Challenge: Processing web texts for classifying job offers

Today the Web represents a rich source of labour market data for both public and private operators, as a growing number of job offers are advertised through Web portals and services. In this paper we apply and compare several techniques, namely explicit-rules, machine learning, and LDA-based algorithms to classify a real dataset of Web job offers collected from 12 heterogeneous sources against a standard classification system of occupations.

[1]  Frank Anshen,et al.  Statistics for linguistics , 1978 .

[2]  Susan Conrad,et al.  Corpus Linguistics: Investigating Language Structure and Use , 1998 .

[3]  Graeme D. Kennedy,et al.  Book Reviews: An Introduction to Corpus Linguistics , 1999, CL.

[4]  Douglas E. Appelt,et al.  Introduction to Information Extraction , 1999, AI Commun..

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[7]  Kun Yu,et al.  Resume Information Extraction with Cascaded Hybrid Model , 2005, ACL.

[8]  James Allan,et al.  Matching resumes and jobs based on relevance models , 2007, SIGIR.

[9]  Flora Amato,et al.  Building RDF Ontologies from Semi-Structured Legal Documents , 2008, 2008 International Conference on Complex, Intelligent and Software Intensive Systems.

[10]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[11]  Flora Amato,et al.  A system for semantic retrieval and long-term preservation of multimedia documents in the e-government domain , 2009, Int. J. Web Grid Serv..

[12]  Karthik Visweswariah,et al.  PROSPECT: a system for screening candidates for recruitment , 2010, CIKM.

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  In Lee,et al.  Modeling the benefit of e-recruiting process integration , 2011, Decis. Support Syst..

[15]  Paolo Napoletano,et al.  Improving Text Retrieval Accuracy by Using a Minimal Relevance Feedback , 2011, IC3K.

[16]  Sven Laumer,et al.  Drivers, challenges and consequences of E-recruiting: a literature review , 2011, SIGMIS-CPR '11.

[17]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[18]  Wenxing Hong,et al.  Dynamic user profile-based job recommender system , 2013, 2013 8th International Conference on Computer Science & Education.

[19]  Paolo Napoletano,et al.  Text classification using a few labeled examples , 2014, Comput. Hum. Behav..

[20]  Roberto Boselli,et al.  How the Social Media Contributes to the Recruitment Process , 2014 .

[21]  Roberto Boselli,et al.  Planning meets Data Cleansing , 2014, ICAPS.

[22]  Núria Bel,et al.  Ranking Job Offers for Candidates: learning hidden knowledge from Big Data , 2014, LREC.

[23]  Roberto Boselli,et al.  A Policy-Based Cleansing and Integration Framework for Labour and Healthcare Data , 2014, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics.

[24]  Roberto Boselli,et al.  A model-based evaluation of data quality activities in KDD , 2015, Inf. Process. Manag..