Mining of Relevant and Informative Posts from Text Forums

In the modern world, the competitive advantage for every person is the possibility to obtain the information in a fast and comfortable way. Web forums occupy a significant place among the sources of information. It is a good place to gain professionally significant knowledge on different topics. However, sometimes it is not easy to identify the places on the forum, which contains useful information corresponding user demands. In this paper we consider the problem of automatic forum text summarization and describe the methods, which can help to solve it. We study the difference between relevance-oriented and useful-oriented query types. We will describe our dataset, that contains over 4000 of marked posts from web forums about various subject domains. The posts were marked by experts, by estimating them on a scale from 0 to 5 for selected query types. The results of our study can provide background for creation informational retrieval applications that will decrease the time of user’s searching and increase the quality of search results.

[1]  Tefko Saracevic,et al.  Evaluation of evaluation in information retrieval , 1995, SIGIR '95.

[2]  Rafeeq Al-Hashemi,et al.  Text Summarization Extraction System (TSES) Using Extracted Keywords , 2010, Int. Arab. J. e Technol..

[3]  Diane Kelly,et al.  Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..

[4]  Christian Wartena,et al.  Topic Detection by Clustering Keywords , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[5]  Kate Smith-Miles Exploratory data analysis , 2011 .

[6]  Yang Liu,et al.  Summarizing web forum threads based on a latent topic propagation process , 2011, CIKM '11.

[7]  Daniela Petrelli,et al.  On the role of user-centred evaluation in the advancement of interactive information retrieval , 2008, Inf. Process. Manag..

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Naomie Salim,et al.  Web Discussion Summarization: Study Review , 2013, DaEng.

[10]  Fabio Ciravegna,et al.  An overview of semantic search evaluation initiatives , 2015, J. Web Semant..

[11]  Natalia V. Dobrenko,et al.  Feature Selection for Language Independent Text Forum Summarization , 2015, KESW.

[12]  Donna Harman,et al.  Information Retrieval Evaluation , 2011, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[13]  Gleb Sizov,et al.  Extraction-Based Automatic Summarization: Theoretical and Empirical Investigation of Summarization Techniques , 2010 .

[14]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[15]  Jie Tang,et al.  Multi-topic Based Query-Oriented Summarization , 2009, SDM.

[16]  Cornelia Caragea,et al.  Using non-lexical features for identifying factual and opinionative threads in online forums , 2014, Knowl. Based Syst..

[17]  David F. Nettleton,et al.  Data mining of social networks represented as graphs , 2013, Comput. Sci. Rev..

[18]  ChengXiang Zhai,et al.  Shallow Information Extraction from Medical Forum Data , 2010, COLING.

[19]  Sanda Martinčić-Ipšić,et al.  An Overview of Graph-Based Keyword Extraction Methods and Approaches , 2015 .

[20]  Benxiong Huang,et al.  An approach to rank reviews by fusing and mining opinions based on review pertinence , 2015, Inf. Fusion.

[21]  Qingtian Zeng,et al.  Micro-blog Keyword Extraction Method Based on Graph Model and Semantic Space , 2013, J. Multim..

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Rim Faiz,et al.  Relevant learning objects extraction based on semantic annotation , 2013, Int. J. Metadata Semant. Ontologies.

[24]  Brian Lott,et al.  Survey of Keyword Extraction Techniques , 2012 .

[25]  Sebastián Ventura,et al.  Predicting students' final performance from participation in on-line discussion forums , 2013, Comput. Educ..

[26]  J. Friedman Stochastic gradient boosting , 2002 .

[27]  Philippe Blache,et al.  Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization , 2014, J. King Saud Univ. Comput. Inf. Sci..

[28]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[29]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[30]  V. B. Rodionov,et al.  Hierarchical clustering of text documents , 2014, Autom. Remote. Control..