Enabling easier information access in online discussion forums

Online discussion forums have become popular in recent times. They provide a platform for people from different parts of the world sharing a common interest to come together and topics of mutual interest and seek solutions to their problems. There are hundreds of thousands of internet forums containing tens of millions of discussion threads and are thus, an important source of human generated information that needs to be efficiently managed. In this dissertation, I focus on following three specific problems: 1. Searching for relevant discussion threads in an online forum archive. A typical discussion thread is different from a generic web page in its structure, linking patterns, and creates content contributed by a large number of participating contributors. A probabilistic retrieval model is proposed that takes into account the structural properties, content properties, and various nontextual relevance indicators such as thread popularity, user expertise, etc. The proposed retrieval model achieved significant improvements over a standard language model based retrieval model and methods that are typically used in online forum websites. 2. Offering query suggestions in a forum search engine. Compared to a web search engine, a typical forum website receives much smaller number of search requests and hence the query log of a forum search engine is small. A probabilistic query suggestion mechanism is proposed that does not rely on query logs and can offer suggestions by computing completions from the forum corpus itself. Experimental results on two different datasets have shown that the proposed approach achieved statistically significant improvements over two state-of-the-art baseline query suggestion techniques. 3. Identifying the role of each user message in a discussion. Different messages in a thread serve different purpose in the discussion. I investigated the problem of classifying individual user posts in an online discussion thread and for post classification, I designed and experimented with a variety of features derived from the posts content, thread structure, user behavior and sentiment analysis of the posts text. Applications of post classifications are also demonstrated in forum thread retrieval and discussion summarization tasks.