A novel machine learning approach to rank web forum posts

Since the user generated contents in Web forums are rich but vary in quality, ranging from excellent detailed opinions to simple repetition of the content of previous, or even spams, it is difficult to find high quality information in the process of post browsing, retrieval and other Web forum applications. In this paper, we propose a novel machine learning approach named LGPRank to evaluate the web forum posts, where a genetic programming architecture is used to rank Web forum posts according to the qualities of their contents. In order to address the shortcomings of current studies, we take both the semantic-free and semantic-specific information of a post into account. We propose a set of new features named Latent Dirichlet Allocation (LDA) semantic features which are computed in LDA topic space. The proposed features as well as content surface features and forum specific features are used in the learning process. Experiments are conducted on three web forum datasets in comparison with methods used in prior ranking research. LGPRank outperforms all the other methods in terms of P@N, NDCG@N and MAP measures. Furthermore, the experimental results also indicate that the proposed LDA semantic features have a positive effect in improving the ranking performance.

[1]  Chien Chin Chen,et al.  Quality evaluation of product reviews using an information quality framework , 2011, Decis. Support Syst..

[2]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[3]  Alton Yeow-Kuan Chua,et al.  What Makes a High-Quality User-Generated Answer? , 2011, IEEE Internet Computing.

[4]  Yung-Ming Li,et al.  A social recommender mechanism for improving knowledge sharing in online forums , 2012, Inf. Process. Manag..

[5]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[6]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[7]  Eric Brill,et al.  Learning effective ranking functions for newsgroup search , 2004, SIGIR '04.

[8]  Vasja Vehovar,et al.  Posting, quoting, and replying: a comparison of methodological approaches to measure communication ties in web forums , 2012 .

[9]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[10]  Wei Liu,et al.  Automatically extracting user reviews from forum sites , 2011, Comput. Math. Appl..

[11]  Raymond Y. K. Lau,et al.  Text mining and probabilistic language modeling for online review spam detecting , 2011 .

[12]  Raymond Y. K. Lau,et al.  Text mining and probabilistic language modeling for online review spam detection , 2012, TMIS.

[13]  Guodong Zhou,et al.  What reviews are satisfactory: novel features for automatic helpfulness voting , 2012, SIGIR '12.

[14]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[15]  Richard Y. K. Fung,et al.  Identifying helpful online reviews: A product designer's perspective , 2013, Comput. Aided Des..

[16]  Bo Li,et al.  Algorithm for recommending answer providers in community-based question answering , 2012, J. Inf. Sci..

[17]  Ari Rappoport,et al.  RevRank: A Fully Unsupervised Algorithm for Selecting the Most Helpful Book Reviews , 2009, ICWSM.

[18]  Wei-Pang Yang,et al.  Learning to Rank for Information Retrieval Using Genetic Programming , 2007 .

[19]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Weiguo Fan,et al.  Discovery of context-specific ranking functions for effective information retrieval using genetic programming , 2004, IEEE Transactions on Knowledge and Data Engineering.

[21]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[22]  Richong Zhang,et al.  Opinion helpfulness prediction in the presence of “words of few mouths” , 2011, World Wide Web.

[23]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[24]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[25]  Chen Lin,et al.  Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications , 2009, SIGIR.

[26]  Hsinchun Chen,et al.  Gender Classification for Web Forums , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[27]  Richi Nayak,et al.  Leveraging the network information for evaluating answer quality in a collaborative question answering portal , 2012, Social Network Analysis and Mining.

[28]  Barry Smyth,et al.  A Classification-based Review Recommender , 2009, SGAI Conf..

[29]  James Fan,et al.  Learning to rank for robust question answering , 2012, CIKM.

[30]  Susumu Horiguchi,et al.  A Hidden Topic-Based Framework toward Building Applications with Short Web Documents , 2011, IEEE Transactions on Knowledge and Data Engineering.

[31]  Philip S. Yu,et al.  Identify Online Store Review Spammers via Social Review Graph , 2012, TIST.

[32]  Barry Smyth,et al.  A classification-based review recommender , 2010, Knowl. Based Syst..

[33]  Panagiotis G. Ipeirotis,et al.  Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics , 2010, IEEE Transactions on Knowledge and Data Engineering.

[34]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[35]  Xiaohui Yu,et al.  Modeling and Predicting the Helpfulness of Online Reviews , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[36]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[37]  Zhu Zhang Weighing Stars: Aggregating Online Product Reviews for Intelligent E-commerce Applications , 2008, IEEE Intelligent Systems.

[38]  Iryna Gurevych,et al.  Predicting the perceived quality of web forum posts , 2007 .

[39]  Wei-Ying Ma,et al.  Building implicit links from content for forum search , 2006, SIGIR.

[40]  Wei-Pang Yang,et al.  Designing a classifier by a layered multi-population genetic programming approach , 2007, Pattern Recognit..

[41]  Li Zhang,et al.  PostingRank: Bringing Order to Web Forum Postings , 2008, AIRS.