Explicit and Latent Topic Representations of Information Spaces in Social Information Retrieval

We evaluate the suitability of latent and explicit semantic spaces of documents for Information Retrieval (IR) tasks using a dataset obtained from the Q&A community Stackexchange. In addition, the ability of the latent semantic spaces to reconstruct human relevance judgments is explored. The latent semantic spaces are generated with Latent Dirichlet Allocation (LDA), while explicit semantic spaces are modeled using Explicit Semantic Analyis (ESA). In the first part of the experiment, a series of ad-hoc information retrieval tasks is performed, interpreting closeness in the semantic and explicit spaces as a criterion for relevance. In the second part, it is investigated whether the latent semantic representation allows to infer user defined quality assessments of answers. The findings suggest that the semantic spaces show a correlation between query and relevant information items, however, both algorithms are outperformed by a simple Vector Space Model using TF-IDF. In addition, no significant correlation between the user defined order of relevant answers to a question and the similarity-based order (using closeness in the latent semantic space as similarity function) could be demonstrated.