Extracting the discussion structure in comments on news-articles

Several on-line daily newspapers offer readers the opportunity to directly comment on articles. In the Netherlands this feature is used quite often and the quality (grammatically and content-wise) is surprisingly high. We develop techniques to collect, store, enrichand analyze these comments. After giving a high-level overview of the Dutch 'commentosphere' we zoom in on extracting the discussion structure found in flat comment threads; people not only comment on the news article, they also heavily comment on other comments, resembling discussion fora. We show how techniques from information retrieval, natural language processing and machine learning can be used to extract the 'reacts-on' relation between comments with high precision and recall.