Pacific Asia Conference on Information Systems ( PACIS ) 7-15-2012 Clustering Similar Questions In Social Question Answering Services

Social Question Answering (SQA) services are defined as dedicated platforms for users to respond to other users' questions, resulting in the building of a community where users share and interactively give ratings to questions and answers (Liu et al., 2008). SQA services are emerging as a valuable information resource that is rich not only in the expertise of the user community but also their interactions and insights in the form of user comments and ratings. In SQA services each user interaction is different and since there are a variety of complex questions, identifying similar questions for re-using answers is difficult. Scholarly inquiries have yet to dovetail into a composite research stream in identifying similar questions by harnessing the information richness in SQA services. This paper aims to develop a quadripartite graph-based clustering (QGC) approach by harnessing relationship of a question with common answers and associated users. It was found that QGC approach outperformed other baseline clustering techniques in identifying similar questions in SQA corpora. We believe that these findings can serve to guide future developments in the reuse of similar question in SQA services.

[1]  Eugene Agichtein,et al.  CoCQA: Co-Training over Questions and Answers with an Application to Predicting Question Subjectivity Orientation , 2008, EMNLP.

[2]  Thomas R. Gruber,et al.  Collective knowledge systems: Where the Social Web meets the Semantic Web , 2008, J. Web Semant..

[3]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[4]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[5]  Hoa Trang Dang,et al.  Overview of the TREC 2006 Question Answering Track 99 , 2006, TREC.

[6]  Koichi Takeuchi Extraction of Verb Synonyms using Co-clustering Approach , 2008, 2008 Second International Symposium on Universal Communication.

[7]  Jin-Sha Yuan,et al.  An Efficient User Access Pattern Clustering Algorithm , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[8]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[9]  Stefan Siersdorfer,et al.  A neighborhood-based approach for clustering of linked document collections , 2006, CIKM '06.

[10]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[11]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[12]  Jing Hua,et al.  Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering , 2008, WWW.

[13]  Xin Chen,et al.  Exploit the tripartite network of social tagging for web clustering , 2009, CIKM.

[14]  Farshad Fotouhi,et al.  Co-clustering Documents and Words Using Bipartite Isoperimetric Graph Partitioning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[16]  Eugene Agichtein,et al.  Predicting information seeker satisfaction in community question answering , 2008, SIGIR '08.

[17]  Rich Gazan,et al.  Social Q&A , 2011, J. Assoc. Inf. Sci. Technol..

[18]  Mohan John Blooma,et al.  Research Issues In Community Based Question Answering , 2011, PACIS.

[19]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[20]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[21]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[22]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[23]  Alton Yeow-Kuan Chua,et al.  Quadripartite Graph-based Clustering of Questions , 2011, 2011 Eighth International Conference on Information Technology: New Generations.

[24]  Eugene Agichtein,et al.  Exploring question subjectivity prediction in community QA , 2008, SIGIR '08.

[25]  W. Bruce Croft,et al.  Finding semantically similar questions based on their answers , 2005, SIGIR '05.

[26]  M Ausloos,et al.  Uncovering collective listening habits and music genres in bipartite networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Eugene Agichtein,et al.  Learning to recognize reliable users and content in social media with coupled mutual reinforcement , 2009, WWW '09.

[28]  A. Arenas,et al.  Community analysis in social networks , 2004 .

[29]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[30]  Marcel Ausloos,et al.  Contextualising tags in collaborative tagging systems , 2009, HT '09.

[31]  Kenneth Wai-Ting Leung,et al.  Personalized Concept-Based Clustering of Search Engine Queries , 2008, IEEE Transactions on Knowledge and Data Engineering.

[32]  Yiling Chen,et al.  A Bipartite Graph Co-Clustering Approach to Ontology Mapping , 2003 .

[33]  Haesun Park,et al.  Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[34]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[35]  Akihiro Tamura,et al.  Classification of Multiple-Sentence Questions , 2005, IJCNLP.

[36]  Marija Mitrovic,et al.  Mixing patterns and communities on bipartite graphs on web-based social interactions , 2009, 2009 16th International Conference on Digital Signal Processing.

[37]  Stephen Gilmore,et al.  Evaluating the Performance of Skeleton-Based High Level Parallel Programs , 2004, International Conference on Computational Science.

[38]  B. Walter,et al.  Fast agglomerative clustering for rendering , 2008, 2008 IEEE Symposium on Interactive Ray Tracing.