Employers’ Expectations: A Probabilistic Text Mining Model

Abstract This study uses text mining techniques to analyze employment data posted over the internet. The objective is to identify knowledge areas, skills and expertise relevant to jobs in the construction industry. We utilized the fast growing online job search engines to understand the construction job market and employer expectations. Over 20,000 job advertisements were downloaded from various websites between Oct 14th 2012 and March 15th 2013. We developed a text mining method to identify derived job qualification information from the downloaded pages. The developed algorithm is capable to derive rules by automatically extracting statistically significant patterns present inside preselected qualifications. The selection rules can then be used to detect the presence of these qualifications in new pages. Once the qualifications are identified, we used the Latent Dirichlet Allocation (LDA) model to identify groups of skills that are required by employers. One of the major advantages of implementing LDA model is that it is an unsupervised approach and no training is needed. The algorithm was applied to a case study as an illustrative example.