Multivariate Beta Mixture Model for Automatic Identification of Topical Authoritative Users in Community Question Answering Sites

Community question answering (CQA) site is an online community to provide valuable information in wide variety of topics in question-answer form to users'. The major problem with CQA lies in identifying the authoritative users in the domain of the question so as to route the question to right experts and selecting the best answer etc. The existing work suffers from one or more limitations such as: 1) lack of automatic mechanism to distinguish between authoritative and non-authoritative users in specified topics; 2) the high dependence on its training data in supervised learning which is too time-consuming process to obtain labeled samples of data manually; and 3) some approaches rely on using some cutoff parameters to estimate an authority score. In this paper, a parameterless mixture model approach is proposed to identify topical authoritative users to overcome the above-mentioned limitations. The statistical framework based on multivariate beta mixtures is utilized on feature vector of users' which is composed of information related to user activities on CQA site. The probability density function is therefore devised and the beta mixture component that corresponds to the most authoritative user is identified. The suitability of the proposed approach is illustrated on real data of two CQA sites: StackOverflow and AskUbuntu. The result shows that the proposed model is remarkable in identifying the authoritative users in comparison with conventional classifiers and Gaussian mixture model.

[1]  Mohamed Bouguessa,et al.  Identifying Authorities in Online Communities , 2015, ACM Trans. Intell. Syst. Technol..

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  Ahmed E. Hassan,et al.  What are developers talking about? An analysis of topics and trends in Stack Overflow , 2014, Empirical Software Engineering.

[4]  Senthil Mani,et al.  Exploring activeness of users in QA forums , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[5]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Huiping Sun,et al.  CQArank: jointly model topics and expertise in community question answering , 2013, CIKM.

[7]  Christopher C. Yang,et al.  Ranking User Influence in Healthcare Social Media , 2012, TIST.

[8]  W. Dixon,et al.  Introduction to Mathematical Statistics. , 1964 .

[9]  Robert E. Kraut,et al.  Early detection of potential experts in question answering communities , 2011, UMAP'11.

[10]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[11]  Hui Xiong,et al.  Towards expert finding by leveraging relevant categories in authority ranking , 2011, CIKM '11.

[12]  Nizar Bouguila,et al.  Practical Bayesian estimation of a finite beta mixture through gibbs sampling and its applications , 2006, Stat. Comput..

[13]  Suresh Manandhar,et al.  Exploring user expertise and descriptive ability in community question answering , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Nizar Bouguila,et al.  Model-based subspace clustering of non-Gaussian data , 2010, Neurocomputing.

[16]  Lena Mamykina,et al.  Design lessons from the fastest q&a site in the west , 2011, CHI.

[17]  Evangelos E. Milios,et al.  Finding expert users in community question answering , 2012, WWW.

[18]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[19]  Mohamed Bouguessa A Mixture Model-Based Combination Approach for Outlier Detection , 2014, Int. J. Artif. Intell. Tools.

[20]  Mohamed Bouguessa,et al.  An Unsupervised Approach for Identifying Spammers in Social Networks , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[21]  Shengrui Wang,et al.  Discovering Knowledge-Sharing Communities in Question-Answering Forums , 2010, TKDD.

[22]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[23]  Arne Leijon,et al.  Beta mixture models and the application to image classification , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[24]  Tat-Seng Chua,et al.  Discovering high quality answers in community question answering archives using a hierarchy of classifiers , 2014, Inf. Sci..

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  F. Maxwell Harper,et al.  Exploring Question Selection Bias to Identify Experts and Potential Experts in Community Question Answering , 2012, TOIS.

[27]  Huidong Jin,et al.  A segmented topic model based on the two-parameter Poisson-Dirichlet process , 2010, Machine Learning.

[28]  Baoxin Li,et al.  Towards Predicting the Best Answers in Community-based Question-Answering Services , 2013, ICWSM.

[29]  Jeffrey Pomerantz,et al.  Evaluating and predicting answer quality in community QA , 2010, SIGIR.

[30]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[31]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[32]  Junjie Yao,et al.  Routing Questions to the Right Users in Online Communities , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[33]  W. Bruce Croft,et al.  Hierarchical Language Models for Expert Finding in Enterprise Corpora , 2008, Int. J. Artif. Intell. Tools.

[34]  Mohamed Bouguessa,et al.  A practical outlier detection approach for mixed-attribute data , 2015, Expert Syst. Appl..

[35]  Tingting He,et al.  An empirical study of topic-sensitive probabilistic model for expert finding in question answer communities , 2014, Knowl. Based Syst..

[36]  Duen-Ren Liu,et al.  Expert finding in question-answering websites: a novel hybrid approach , 2010, SAC '10.

[37]  Zhanyu Ma Non-Gaussian Statistical Modelsand Their Applications , 2011 .

[38]  Aristides Gionis,et al.  Social Network Analysis and Mining for Business Applications , 2011, TIST.

[39]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[40]  Shengrui Wang,et al.  Identifying authoritative actors in question-answering forums: the case of Yahoo! answers , 2008, KDD.

[41]  David Lo,et al.  Predicting Best Answerers for New Questions: An Approach Leveraging Topic Modeling and Collaborative Voting , 2013, SocInfo Workshops.