CompanyDepot: Employer Name Normalization in the Online Recruitment Industry

Entity linking links entity mentions in text to the corresponding entities in a knowledge base (KB) and has many applications in both open domain and specific domains. For example, in the recruitment domain, linking employer names in job postings or resumes to entities in an employer KB is very important to many business applications. In this paper, we focus on this employer name normalization task, which has several unique challenges: handling employer names from both job postings and resumes, leveraging the corresponding location context, and handling name variations, irrelevant input data, and noises in the KB. We present a system called CompanyDepot which contains a machine learning based approach CompanyDepot-ML and a heuristic approach CompanyDepot-H to address these challenges in three steps: (1) searching for candidate entities based on a customized search engine for the KB; (2) ranking the candidate entities using learning-to-rank methods or heuristics; and (3) validating the top-ranked entity via binary classification or heuristics. While CompanyDepot-ML shows better extendability and flexibility, CompanyDepot-H serves as a strong baseline and useful way to collect training data for CompanyDepot-ML. The proposed system achieves 2.5%-21.4% higher coverage at the same precision level compared to an existing system used at CareerBuilder over multiple real-world datasets. Applying the system to a similar task of academic institution name normalization further shows the generalization ability of the method.

[1]  Faizan Javed,et al.  A pipeline for extracting and deduplicating domain-specific knowledge bases , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[2]  Marc Schoenauer,et al.  Preference Learning: Problems and Applications in Ai Preference Learning: Problems and Applications in Ai (pl-12) Contents a Preliminary Study on a Recommender System for the Million Songs Dataset Using and Learning Gai-decompositions for Representing Ordinal Rankings Alleviating Cold-user Start Pro , 2012 .

[3]  Siddhartha Jonnalagadda,et al.  NEMO: Extraction and normalization of organization names from PubMed affiliation strings , 2010, Journal of biomedical discovery and collaboration.

[4]  Udo Hahn,et al.  High-performance gene name normalization with GENO , 2009, Bioinform..

[5]  Hakan Kardes,et al.  Graph-based Approaches for Organization Entity Resolution in MapReduce , 2013, TextGraphs@EMNLP.

[6]  Paul G. Spirakis,et al.  Weighted random sampling with a reservoir , 2006, Inf. Process. Lett..

[7]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[8]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.

[9]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[10]  G. Lester Anderson New universities in the United Kingdom , 1970 .

[11]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[12]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[13]  Jian Su,et al.  Entity Linking Leveraging Automatically Generated Annotation , 2010, COLING.

[14]  Jian Su,et al.  Entity Linking with Effective Acronym Expansion, Instance Selection, and Topic Modeling , 2011, IJCAI.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Siddhartha Jonnalagadda,et al.  NEMO: Extraction and normalization of organization names from PubMed affiliations , 2010, Journal of Biomedical Discovery and Collaboration.

[17]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[18]  Walid Magdy,et al.  Arabic Cross-Document Person Name Normalization , 2007, SEMITIC@ACL.

[19]  Jian Su,et al.  NUS-I2R: Learning a Combined System for Entity Linking , 2010, TAC.

[20]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[21]  Anmol Bhasin,et al.  Entity Resolution Using Social Graphs for Business Applications , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[22]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[23]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[24]  Andrew Borthwick,et al.  Dynamic Record Blocking: Efficient Linking of Massive Databases in MapReduce , 2012 .

[25]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[26]  G. Prasad LEARNING TO LINK ENTITIES WITH KNOWLEDGE BASE , 2016 .

[27]  Balázs Kégl,et al.  An apple-to-apple comparison of Learning-to-rank algorithms in terms of Normalized Discounted Cumulative Gain , 2012, ECAI 2012.

[28]  Faizan Javed,et al.  sCooL: A system for academic institution name normalization , 2014, 2014 International Conference on Collaboration Technologies and Systems (CTS).