Towards Finding Valuable Topics

Enterprises depend on their information workers finding valuable information to be productive. However, existing enterprise search and recommendation systems can exploit few studies on the correlation between information content and information workers’ productivity. In this paper, we combine content, social network and revenue analysis to identify computational metrics for finding valuable information content in people’s electronic communications within a large-scale enterprise. Specifically, we focus on two questions: (1) how are the topics extracted from such content correlate with information workers’ performance? and (2) how to find valuable topics with potentially high impact on employee performance? For the first question, we associate the topics with the corresponding workers’ productivity measured by the revenue they generate. This allows us to evaluate the topics’ influence on productivity. We further verify that the derived topic values are consistent with human assessor subjective evaluation. For the second question, we identify and evaluate a set of significant factors including both content and social network factors. In particular, the social network factors are better in filtering out low-value topics, while content factors are more effective in selecting a few top high-value topics. In addition, we demonstrate that a Support Vector regression model that combines the factors can already effectively find valuable topics. We believe that our results provide significant insights towards scientific advances to find valuable information.

[1]  Jafar Adibi,et al.  Discovering important nodes through graph entropy the case of Enron email database , 2005, LinkKDD '05.

[2]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[3]  Kate Ehrlich,et al.  SmallBlue: People Mining for Expertise Search , 2008, IEEE MultiMedia.

[4]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[5]  Ching-Yung Lin,et al.  Modeling and predicting personal information dissemination behavior , 2005, KDD '05.

[6]  Simon Rodan,et al.  More than Network Structure: How Knowledge Heterogeneity Influences Managerial Performance and Innovativeness , 2004 .

[7]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[8]  Jimeng Sun,et al.  DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[10]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[11]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[12]  Chris Clifton,et al.  TopCat: data mining for topic identification in a text corpus , 1999, IEEE Transactions on Knowledge and Data Engineering.

[13]  Yihong Gong,et al.  A Bayesian Approach Toward Finding Communities and Their Evolutions in Dynamic Social Networks , 2009, SDM.

[14]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[15]  Huan Liu,et al.  Community evolution in dynamic multi-mode networks , 2008, KDD.

[16]  Matthew Hurst,et al.  Deriving marketing intelligence from online discussion , 2005, KDD '05.

[17]  Daniel Barbará,et al.  Topic Significance Ranking of LDA Generative Models , 2009, ECML/PKDD.

[18]  Jimeng Sun,et al.  MultiVis: Content-Based Social Network Exploration through Multi-way Visual Analysis , 2009, SDM.

[20]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[21]  Ching-Yung Lin,et al.  ExpertiseNet: Relational and Evolutionary Expert Modeling , 2005, User Modeling.

[22]  A. Banerjee,et al.  Social Topic Models for Community Extraction , 2008 .

[23]  P. Hespanha,et al.  An Efficient MATLAB Algorithm for Graph Partitioning , 2006 .

[24]  Theodoros Lappas,et al.  Finding a team of experts in social networks , 2009, KDD.

[25]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[26]  Myra Spiliopoulou,et al.  Topic Evolution in a Stream of Documents , 2009, SDM.

[27]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Brenda Sugrue,et al.  Profiling a New Breed of Learning Executive , 2006 .

[30]  J. Cross Informal Learning: Rediscovering the Natural Pathways That Inspire Innovation and Performance , 2006 .

[31]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[32]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Thomas Karagiannis,et al.  WWW 2009 MADRID! Track: Social Networks and Web 2.0 / Session: Diffusion and Search in Social Networks Behavioral Profiles for Advanced Email Features , 2022 .

[34]  Sinan Aral,et al.  Value of Social Network -- A Large-Scale Analysis on Network Structure Impact to Financial Revenue of Information Technology Consultants , 2009 .