论文信息 - JobOlize - Headhunting by Information Extraction in the Era of Web 2.0

JobOlize - Headhunting by Information Extraction in the Era of Web 2.0

E-recruitment is one of the most successful e-business applications supporting both, headhunters and job seekers. The explosive growth of online job offers makes the usage of information extraction techniques to build up, e.g., job portals in a semi-automatic way a necessity. Existing approaches, however, hardly cope with the heterogeneous and semi-structured nature of job offers nor do they consider potentials offered by Web 2.0 technologies. This paper proposes an information extraction system called “JobOlize” 1 , realized for arbitrarily structured IT job offers. To improve extraction quality, a hybrid approach is employed, combining existing NLP-techniques with a new form of context-driven extraction, incorporating layout, structure and content information. To allow users a proper adaptation of the extraction results while preserving the look and feel of the original Web pages, a rich client interface is provided. The improvements in extraction quality are justified on basis of a case study and the experiences gained are generalized and critically reflected by discussing lessons learned.

[1] Jan-Ming Ho,et al. Discovering informative content blocks from Web documents , 2002, KDD.

[2] Gerti Kappel,et al. Lifting metamodels to ontologies: a step to the semantic integration of modeling languages , 2006, MoDELS'06.

[3] Atanas Kiryakov,et al. KIM – a semantic platform for information extraction and retrieval , 2004, Natural Language Engineering.

[4] Edgar R. Weippl,et al. On cooperatively creating dynamic ontologies , 2005, HYPERTEXT '05.

[5] Deepayan Chakrabarti,et al. Page-level template detection via isotonic smoothing , 2007, WWW '07.

[6] Thomas G. Szymanski,et al. A fast algorithm for computing longest common subsequences , 1977, CACM.

[7] Elena Simperl,et al. Practical Guidelines for Building Semantic eRecruitment Applications , 2006 .

[8] Claudio Giuliano,et al. A Critical Survey of the Methodology for IE Evaluation , 2004, LREC.

[9] Wolfgang Gatterbauer,et al. Towards domain-independent information extraction from web tables , 2007, WWW '07.

[10] Siegfried Handschuh,et al. Semantic annotation for knowledge management: Requirements and a survey of the state of the art , 2006, J. Web Semant..

[11] Sandip Debnath,et al. Automatic identification of informative sections of Web pages , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12] Xiaoli Li,et al. Eliminating noisy information in Web pages for data mining , 2003, KDD '03.

[13] Werner Retschitzegger,et al. A software architecture for ontology-driven situation awareness , 2008, SAC '08.

[14] Khaled Shaalan,et al. A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[15] Ziv Bar-Yossef,et al. Template detection via data mining and its applications , 2002, WWW.

[16] Ming-Syan Chen,et al. WISDOM: Web intrapage informative structure mining based on document object model , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17] Jesualdo Tomás Fernández-Breis,et al. An ontology-based intelligent system for recruitment , 2006, Expert Syst. Appl..