Data-Driven Proactive Policy Assurance of Post Quality in Community q&a Sites

To ensure the post quality, Q&A sites usually develop a list of quality assurance guidelines for "dos and don'ts", and adopt collaborative editing mechanism to fix quality violations. Quality guidelines are mostly high-level principles, and many tacit and context-sensitive aspects of the expected quality cannot be easily enforced by a set of explicit rules. Collaborative editing is a reactive mechanism after low-quality posts have been posted. Our study of collaborative editing data on Stack Overflow suggests that tacit and context-sensitive quality-assurance knowledge is manifested in the editing patterns of large numbers of collaborative edits. Inspired by this observation, we develop and evaluate a Convolutional Neural Network based approach to learn editing patterns from historical post edits for predicting the need of editing a post. Our approach provides a proactive policy assurance mechanism that warns users potential quality issues in a post before it is posted.

[1]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[2]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[3]  Zhenchang Xing,et al.  Mining Technology Landscape from Stack Overflow , 2016, ESEM.

[4]  Sid Cass THANKS A MILLION , 1996 .

[5]  Ee-Peng Lim,et al.  Measuring article quality in wikipedia: models and evaluation , 2007, CIKM '07.

[6]  Les Gasser,et al.  Information quality work organization in wikipedia , 2008, J. Assoc. Inf. Sci. Technol..

[7]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[8]  Zhenchang Xing,et al.  Towards Correlating Search on Google and Asking on Stack Overflow , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[9]  Michele Lanza,et al.  Improving Low Quality Stack Overflow Post Detection , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  Linda C. Smith,et al.  INFORMATION QUALITY IN A COMMUNITY-BASED ENCYCLOPEDIA , 2005 .

[12]  Elisa Bertino,et al.  Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.

[13]  Daniel Jurafsky,et al.  Neural Language Correction with Character-Based Attention , 2016, ArXiv.

[14]  Nishio Takayuki,et al.  Deep Learning Tutorial , 2018 .

[15]  Hongyuan Huo,et al.  Inversion of FeO and TiO2 content using microwave radiance simulation based on Chang-E2 passive microwave radiometer data , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[16]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[17]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[18]  Benno Stein,et al.  Predicting quality flaws in user-generated content: the case of wikipedia , 2012, SIGIR '12.

[19]  Zhenchang Xing,et al.  TechLand: Assisting Technology Landscape Inquiries with Insights from Stack Overflow , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[20]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[21]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[22]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[23]  Wenpeng Yin,et al.  Convolutional Neural Network for Paraphrase Identification , 2015, NAACL.

[24]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[25]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[26]  Zhenchang Xing,et al.  Unsupervised Software-Specific Morphological Forms Inference from Informal Discussions , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[27]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[28]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[29]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[30]  Etienne Wenger,et al.  Situated Learning: Legitimate Peripheral Participation , 1991 .

[31]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[32]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[33]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[34]  Haiyi Zhu,et al.  Is It Good to Be Like Wikipedia?: Exploring the Trade-offs of Introducing Collaborative Editing Model to Q&A Sites , 2015, CSCW.

[35]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[36]  Zhenchang Xing,et al.  Predicting semantically linkable knowledge in developer online forums via convolutional neural network , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[37]  Zhenchang Xing,et al.  Learning a dual-language vector space for domain-specific cross-lingual question retrieval , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[38]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[39]  Yang Liu,et al.  By the Community & For the Community , 2017, Proc. ACM Hum. Comput. Interact..

[40]  Lena Mamykina,et al.  Design lessons from the fastest q&a site in the west , 2011, CHI.

[41]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[42]  Linda C. Smith,et al.  INFORMATION QUALITY DISCUSSIONS IN WIKIPEDIA , 2005 .

[43]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..