Extracting Chatbot Knowledge from Online Discussion Forums

This paper presents a novel approach for extracting high-quality 〈thread-title, reply〉 pairs as chat knowledge from online discussion forums so as to efficiently support the construction of a chatbot for a certain domain. Given a forum, the high-quality 〈thread-title, reply〉 pairs are extracted using a cascaded framework. First, the replies logically relevant to the thread title of the root message are extracted with an SVM classifier from all the replies, based on correlations such as structure and content. Then, the extracted 〈thread-title, reply〉 pairs are ranked with a ranking SVM based on their content qualities. Finally, the Top-N 〈thread-title, reply〉 pairs are selected as chatbot knowledge. Results from experiments conducted within a movie forum show the proposed approach is effective.