User-Constrained Clustering in Online Requirements Forums

[Context & motivation:] Software development projects involving geographically dispersed stakeholders often use web-based discussion forums to gather feature requests. Our previous study showed that users have a tendency to create redundant threads as well as large unfocused mega-threads. [Question/problem:] In this paper we propose novel solution for integrating user feedback into the process of dynamically and iteratively clustering features into discussion threads. [Principal ideas/results:] We integrate feed back in the form of stick-together and move-apart advice, plus user-defined tags into our consensus based clustering process. [Contribution:] Experimental results demonstrate that our approach is able to deliver high quality and stable clusters to facilitate forum-based requirements elicitation.

[1]  Fazli Can,et al.  Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases , 1990, TODS.

[2]  Jane Cleland-Huang,et al.  A consensus based approach to constrained clustering of software requirements , 2008, CIKM '08.

[3]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[4]  Hui Xiong,et al.  Enhancing semi-supervised clustering: a feature projection perspective , 2007, KDD '07.

[5]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[6]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[7]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[8]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[9]  Jörg Rech,et al.  Wiki-Based Stakeholder Participation in Requirements Engineering , 2007, IEEE Software.

[10]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jane Cleland-Huang,et al.  Supporting Domain Analysis through Mining and Recommending Features from Online Product Listings , 2013, IEEE Transactions on Software Engineering.

[12]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[13]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[14]  S. S. Ravi,et al.  Identifying and Generating Easy Sets of Constraints for Clustering , 2006, AAAI.

[15]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[16]  Andrew McCallum,et al.  Semi-Supervised Clustering with User Feedback , 2003 .

[17]  Jane Cleland-Huang,et al.  A recommender system for dynamically evolving online forums , 2009, RecSys '09.

[18]  Jane Cleland-Huang,et al.  Lessons Learned from Open Source Projects for Facilitating Online Requirements Processes , 2009, REFSQ.

[19]  Robert L. Glass,et al.  The Standish report: does it really describe a software crisis? , 2006, CACM.

[20]  Jane Cleland-Huang,et al.  Automated support for managing feature requests in open forums , 2009, CACM.