Linguistic information extraction for job ads (SIRE project)

As a text, each job advertisement expresses rich information about the occupation at hand, such as competence needs (i.e. required degrees, field knowledge, task expertise or technical skills). To facilitate the access to this information, the SIRE project conducted a corpus based study of how to articulate HR expert ontologies with modern semi-supervised information extraction techniques. An adaptive semantic labeling framework is developed through a parallel work on retrieval rules and on latent semantic lexicons of terms and jargon phrases. In its operational stage, our prototype will collect online job ads and index their content into detailed RDF triples compatible with applications ranging from enhanced job search to automated labor-market analysis.