A Graph-Based Approach to Skill Extraction from Text

This paper presents a system that performs skill extraction from text documents. It outputs a list of professional skills that are relevant to a given input text. We argue that the system can be practical for hiring and management of personnel in an organization. We make use of the texts and the hyperlink graph of Wikipedia, as well as a list of professional skills obtained from the LinkedIn social network. The system is based on first computing similarities between an input document and the texts of Wikipedia pages and then using a biased, hub-avoiding version of the Spreading Activation algorithm on the Wikipedia graph in order to associate the input document with skills.

[1]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[5]  Marijn Koolen,et al.  The meaning of structure: the value of link evidence for information retrieval , 2011, SIGF.

[6]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[7]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[8]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[9]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[10]  Debapriyo Majumdar,et al.  Why spectral retrieval works , 2005, SIGIR '05.

[11]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[12]  Jaideep Srivastava,et al.  Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005, Newport Beach, CA, USA, April 21-23, 2005 , 2005, SDM.

[13]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[14]  Peter Bailey,et al.  Overview of the TREC 2008 Enterprise Track , 2008, TREC.

[15]  Matthew Brand,et al.  A Random Walks Perspective on Maximizing Satisfaction and Profit , 2005, SDM.

[16]  Gianluca Demartini,et al.  Overview of the INEX 2009 Entity Ranking Track , 2009, INEX.

[17]  Herman Arnold Engelbrecht,et al.  Measuring Conceptual Similarity by Spreading Activation over Wikipedia's Hyperlink Structure , 2010, PWNLP@COLING.

[18]  Fabio Crestani,et al.  Application of Spreading Activation Techniques in Information Retrieval , 1997, Artificial Intelligence Review.

[19]  Marco Saerens,et al.  Semi-supervised classification and betweenness computation on large, sparse, directed graphs , 2011, Pattern Recognit..

[20]  A. Fronczak,et al.  Biased random walks in complex networks: the role of local navigation rules. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Jeff Shrager,et al.  Observation of Phase Transitions in Spreading Activation Networks , 1987, Science.

[22]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[23]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[24]  C. MogotsiI. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze , 2010 .

[25]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[26]  John R. Anderson A spreading activation theory of memory. , 1983 .

[27]  M. de Rijke,et al.  Expertise Retrieval , 2012, Found. Trends Inf. Retr..

[28]  J. Hobbs,et al.  Semantic Interpretation and Ambiguity , 1988 .

[29]  Peter W. Foltz,et al.  Latent semantic analysis for text-based research , 1996 .