Discovering genres of online discussion threads via text mining

As course management systems (CMS) gain popularity in facilitating teaching. A forum is a key component to facilitate the interactions among students and teachers. Content analysis is the most popular way to study a discussion forum. But content analysis is a human labor intensity process; for example, the coding process relies heavily on manual interpretation; and it is time and energy consuming. In an asynchronous virtual learning environment, an instructor needs to keep monitoring the discussion forum from time to time in order to maintain the quality of a discussion forum. However, it is time consuming and difficult for instructors to fulfill this need especially for K12 teachers. This research proposes a genre classification system, called GCS, to facilitate the automatic coding process. We treat the coding process as a document classification task via modern data mining techniques. The genre of a posting can be perceived as an announcement, a question, clarification, interpretation, conflict, assertion, etc. This research examines the coding coherence between GCS and experts' judgment in terms of recall and precision, and discusses how we adjust the parameters of the GCS to improve the coherence. Based on the empirical results, GCS adopts the cascade classification model to achieve the automatic coding process. The empirical evaluation of the classified genres from a repository of postings in an online course on earth science in a senior high school shows that GCS can effectively facilitate the coding process, and the proposed cascade model can deal with the imbalanced distribution nature of discussion postings. These results imply that GCS based on the cascade model can perform as an automatic posting coding system.

[1]  Margaret Mazzolini,et al.  When to jump in: The role of the instructor in online discussion forums , 2007, Comput. Educ..

[2]  Keh-Jiann Chen,et al.  Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff , 2003, SIGHAN.

[3]  Martin Dougiamas Moodle-A Free, Open Source Course Management System for Online Learning , 2006 .

[4]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[5]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[6]  Karen Kear,et al.  Following the thread in computer conferences , 2001, Comput. Educ..

[7]  Kenneth Ward Church,et al.  Inverse Document Frequency (IDF): A Measure of Deviations from Poisson , 1995, VLC@ACL.

[8]  Stephan Bloehdorn,et al.  Boosting for Text Classification with Semantic Features , 2004, WebKDD.

[9]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[11]  Dale Schuurmans,et al.  Text Classification in Asian Languages without Word Segmentation , 2003 .

[12]  Weiguo Fan,et al.  WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System , 2008 .

[13]  Laurie P. Dringus,et al.  Using data mining as a strategy for assessing asynchronous discussion forums , 2005, Comput. Educ..

[14]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[15]  Sarah Schrire A Model for Evaluating the Process of Learning in Asynchronous Computer Conferencing. , 2003 .

[16]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[17]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[18]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[19]  Judith B. Pena-Shaff,et al.  An Epistemological Framework for Analyzing Student Interactions in Computer-Mediated Communication Environments , 2001 .

[20]  Judith B. Pena-Shaff,et al.  Analyzing student interactions and meaning construction in computer bulletin board discussions , 2004, Comput. Educ..

[21]  Annemarie S. Palincsar,et al.  Social constructivist perspectives on teaching and learning. , 1998, Annual review of psychology.

[22]  Stan Matwin,et al.  Feature Engineering for Text Classification , 1999, ICML.

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[24]  E. Zhu Meaning Negotiation, Knowledge Construction, and Mentoring in a Distance Learning Course , 1996 .

[25]  Curtis J. Bonk,et al.  Content analysis of online discussion in an applied educational psychology course , 2000 .

[26]  Charlotte N. Gunawardena,et al.  Analysis of a Global Online Debate and the Development of an Interaction Analysis Model for Examining Social Construction of Knowledge in Computer Conferencing , 1997 .