The workforce analyzer: group discovery among LinkedIn public profiles

In this paper, we describe two users’ group discovery methods among LinkedIn public profiles. We start by clustering profiles according to their professional background. In this sense, we combine the so-called K-means technique with the gap statistics method and use tag clouds to scrutinize the obtained groups. The second phase of this work consists in classifying the same profiles by relying on a knowledge base. In this context, we design a support-vector-machines multi-label classifier that takes advantage of the LinkedIn job Ads taxonomy. We finally contrast results of both methods and provide insights about the trending professional orientations of the workforce from an online perspective.

[1]  Daling Wang,et al.  A Novel Calibrated Label Ranking Based Method for Multiple Emotions Detection in Chinese Microblogs , 2014, NLPCC.

[2]  Pawan Lingras,et al.  Statistical, Evolutionary, and Neurocomputing Clustering Techniques: Cluster-Based vs Object-Based Approaches , 2005, Artificial Intelligence Review.

[3]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[4]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[5]  Hiroyuki Kitagawa,et al.  Tag-based User Topic Discovery Using Twitter Lists , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[6]  Jing Wang,et al.  Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao , 2012, 2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[7]  Malcolm K. Sparrow,et al.  The application of network analysis to criminal intelligence: An assessment of the prospects , 1991 .

[8]  Huan Liu,et al.  Community Detection and Mining in Social Media , 2010, Community Detection and Mining in Social Media.

[9]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[10]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[11]  Kathleen M. Carley A comparison of artificial and human organizations , 1996 .

[12]  Sang-goo Lee,et al.  Opinion mining of customer feedback data on the web , 2008, ICUIMC '08.

[13]  Jomon Aliyas Paul,et al.  Effect of online social networking on student academic performance , 2012, Comput. Hum. Behav..

[14]  Margaret L. Sheng,et al.  The asymmetric effect of online social networking attribute‐level performance , 2011 .

[15]  Kais Dai,et al.  Scraping and Clustering Techniques for the Characterization of Linkedin Profiles , 2015, ArXiv.

[16]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[17]  Adrian Gardiner,et al.  A LinkedIn Analysis of Career Paths of Information Systems Alumni , 2013 .

[18]  Balaji Raghunathan,et al.  The Complete Book of Data Anonymization: From Planning to Implementation , 2013 .

[19]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[20]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[21]  Jinhee Kim,et al.  Differential and interactive influences on political participation by different types of news activities and political conversation through social media , 2015, Comput. Hum. Behav..

[22]  Anmol Bhasin,et al.  Modeling professional similarity by mining professional career trajectories , 2014, KDD.

[23]  Yang Zhang,et al.  Community Discovery in Twitter Based on User Interests , 2012 .

[24]  Josef Kittler,et al.  Multi-label classification using stacked spectral kernel discriminant analysis , 2016, Neurocomputing.

[25]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[26]  Faïez Gargouri,et al.  Group extraction from professional social network using a new semi-supervised hierarchical clustering , 2013, Knowledge and Information Systems.

[27]  S. Valenzuela Unpacking the Use of Social Media for Protest Behavior , 2013 .

[28]  Douglas Steinley,et al.  K-means clustering: a half-century synthesis. , 2006, The British journal of mathematical and statistical psychology.

[29]  Qiudan Li,et al.  QuestionHolic: Hot topic discovery and trend analysis in community question answering systems , 2011, Expert Syst. Appl..

[30]  P. Rousseeuw,et al.  Displaying a clustering with CLUSPLOT , 1999 .

[31]  Vadim Zaytsev,et al.  BNF was here: what have we done about the unnecessary diversity of notation for syntactic definitions , 2012, SAC '12.

[32]  José van Dijck,et al.  'You have one identity': performing the self on Facebook and LinkedIn , 2013 .

[33]  I. Jolliffe Principal Component Analysis , 2002 .

[34]  Kais Dai,et al.  A New MOOCs’ Recommendation Framework based on LinkedIn Data , 2017 .

[35]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[36]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[37]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.