The ACODEA framework: Developing segmentation and classification schemes for fully automatic analysis of online discussions

Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also, segmenting is a necessary step, but frequently, trained models are very sensitive to the particulars of the segmentation that was used when the model was trained. Therefore, in prior published research on text classification in a CSCL context, the data was segmented by hand. We discuss work towards overcoming these challenges. We present a framework for developing coding schemes optimized for automatic segmentation and context-independent coding that builds on this segmentation. The key idea is to extract the semantic and syntactic features of each single word by using the techniques of part-of-speech tagging and named-entity recognition before the raw data can be segmented and classified. Our results show that the coding on the micro-argumentation dimension can be fully automated. Finally, we discuss how fully automated analysis can enable context-sensitive support for collaborative learning.

[1]  M. H. Heycock,et al.  Papers , 1971, BMJ : British Medical Journal.

[2]  西田豊明 Conference on Applied Natural Language Processingに出席して , 1983 .

[3]  B. Weiner An attributional theory of achievement motivation and emotion. , 1985, Psychological review.

[4]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[5]  Claire O'Malley,et al.  Computer Supported Collaborative Learning , 1995, NATO ASI Series.

[6]  J. Schilperoord,et al.  Linguistics , 1999 .

[7]  George M. Mohay,et al.  Gender-preferential text mining of e-mail discourse , 2002, 18th Annual Computer Security Applications Conference, 2002. Proceedings..

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  Vic Lally,et al.  Complexity, theory and praxis: Researching collaborative learning and tutoring processes in a networked learning community , 2003 .

[10]  T. Landauer Automatic Essay Assessment , 2003 .

[11]  Michael J. Baker,et al.  Argumentation, Computer Support, and the Educational Context of Confronting Cognitions , 2003 .

[12]  Michael J. Baker,et al.  Arguing to Learn: Confronting Cognitions in Computer-Supported Collaborative Learning Environments , 2003 .

[13]  Anat Rachel Shimoni,et al.  Gender, genre, and writing style in formal written texts , 2003 .

[14]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[15]  DIMITRIOS PIERRAKOS,et al.  User Modeling and User-Adapted Interaction , 1994, User Modeling and User-Adapted Interaction.

[16]  C.J.H. Mann,et al.  Handbook of Data Mining and Knowledge Discovery , 2004 .

[17]  Carolyn Penstein Rosé,et al.  An Evaluation of a Hybrid Language Understanding Approach for Robust Selection of Tutoring Goals , 2002, Int. J. Artif. Intell. Educ..

[18]  Bhavani Thuraisingham,et al.  Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics , 2005 .

[19]  Julià Minguillón,et al.  Detecting atypical student behaviour on a e-learning system , 2005 .

[20]  Carolyn Penstein Rosé,et al.  Supporting CSCL with automatic corpus analysis technology , 2005, CSCL.

[21]  Rehab Duwairi A framework for the computerized assessment of university student essays , 2006, Comput. Hum. Behav..

[22]  Carolyn Penstein Rosé,et al.  Providing support for adaptive scripting in an on-line collaborative learning environment , 2006, CHI.

[23]  Tammy Schellens,et al.  Content analysis schemes to analyze transcripts of online asynchronous discussion groups: A review , 2006, Comput. Educ..

[24]  Xiang Yan,et al.  Gender Classification of Weblog Authors , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[25]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[26]  F. Fischer,et al.  A framework to analyze argumentative knowledge construction in computer-supported collaborative learning , 2006, Comput. Educ..

[27]  Jan-Willem Strijbos,et al.  Content analysis: What are they talking about? , 2006, Comput. Educ..

[28]  Douglas B. Clark,et al.  Analytic Frameworks for Assessing Dialogic Argumentation in Online Learning Environments , 2007 .

[29]  John Shawe-Taylor,et al.  Advances in Intelligent Data Analysis VII, 7th International Symposium on Intelligent Data Analysis, IDA 2007, Ljubljana, Slovenia, September 6-8, 2007, Proceedings , 2007, IDA.

[30]  Carolyn Penstein Rosé,et al.  Tutorial Dialogue as Adaptive Collaborative Learning Support , 2007, AIED.

[31]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[32]  Kenneth R. Koedinger,et al.  Proceedings of the 2007 conference on Artificial Intelligence in Education: Building Technology Rich Learning Contexts That Work , 2007 .

[33]  Khaled M. Hammouda,et al.  Data Mining in E-Learning , 2007 .

[34]  Rieks op den Akker,et al.  A Support Vector Machine Approach to Dutch Part-of-Speech Tagging , 2007, IDA.

[35]  Karsten Stegmann,et al.  Facilitating argumentative knowledge construction with computer-supported collaboration scripts , 2007, Int. J. Comput. Support. Collab. Learn..

[36]  Shlomo Argamon,et al.  Mining the Blogosphere: Age, gender and the varieties of self-expression , 2007, First Monday.

[37]  Joan-Andreu Sánchez,et al.  Part-of-Speech Tagging Based on Machine Translation Techniques , 2007, IbPRIA.

[38]  Shlomo Argamon,et al.  Political Leaning Categorization by Exploring Subjectivities in Political Blogs , 2008, DMIN.

[39]  D. Wiliam Assessment in Education: Principles, Policy & Practice , 2008 .

[40]  Carolyn Penstein Rosé,et al.  Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning , 2008, Int. J. Comput. Support. Collab. Learn..

[41]  Hsinchun Chen,et al.  Gender difference analysis of political web forums: An experiment on an international islamic women's forum , 2009, 2009 IEEE International Conference on Intelligence and Security Informatics.

[42]  Christopher D. Manning,et al.  Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[43]  Carolyn Penstein Rosé,et al.  Identifying Types of Claims in Online Customer Reviews , 2009, NAACL.

[44]  Kenneth R. Koedinger,et al.  CTRL: A research framework for providing adaptive collaborative learning support , 2009, User Modeling and User-Adapted Interaction.

[45]  F. Fischer,et al.  Computer-Supported Collaboration Scripts , 2009 .

[46]  William W. Cohen,et al.  Exploiting domain and task regularities for robust named entity recognition , 2009 .

[47]  Carolyn Penstein Rosé,et al.  Generalizing Dependency Features for Opinion Mining , 2009, ACL.

[48]  Carolyn Penstein Rosé,et al.  An Interactive Tool for Supporting Error Analysis for Text Mining , 2010, NAACL.

[49]  Roxana Girju,et al.  Toward Social Causality: An Analysis of Interpersonal Relationships in Online Blogs and Forums , 2010, ICWSM.

[50]  Carolyn Penstein Rosé,et al.  Sentiment Classification using Automatically Extracted Subgraph Features , 2010, HLT-NAACL 2010.

[51]  Arjun Mukherjee,et al.  Improving Gender Classification of Blog Authors , 2010, EMNLP.

[52]  K. Koedinger,et al.  Using Intelligent Tutor Technology to Implement Adaptive Support for Student Collaboration , 2010 .

[53]  Carolyn Penstein Rosé,et al.  Using feature construction to avoid large feature spaces in text classification , 2010, GECCO '10.

[54]  Ari Rappoport,et al.  ICWSM - A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews , 2010, ICWSM.

[55]  F. Rudzicz Human Language Technologies : The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2010 .

[56]  Carlo Strapparava,et al.  Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text , 2010 .

[57]  Karsten Stegmann,et al.  S-COL: A Copernican turn for the development of flexibly reusable collaboration scripts , 2010, Int. J. Comput. Support. Collab. Learn..

[58]  Carolyn Penstein Rosé,et al.  Modeling of Stylistic Variation in Social Media with Stretchy Patterns , 2011 .

[59]  C. Rosé,et al.  Missing Something? Authority in Collaborative Learning , 2011, CSCL.

[60]  Carolyn Penstein Rosé,et al.  Agent-based dynamic support for learning from collaborative brainstorming in scientific inquiry , 2011, Int. J. Comput. Support. Collab. Learn..

[61]  Carolyn Penstein Rosé,et al.  Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model , 2011, ACL.

[62]  Carolyn Penstein Rosé,et al.  Architecture for Building Conversational Agents that Support Collaborative Learning , 2011, IEEE Transactions on Learning Technologies.

[63]  Karsten Stegmann,et al.  Collaborative argumentation and cognitive elaboration in a computer-supported collaborative learning environment , 2012 .

[64]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .