Discovery and Analysis of Public Opinions on Controversial Topics in the Educational Domain

Argumentation is used by everybody in their daily lives as well as work. People frequently need to identify arguments in favor or against a specific topic in order to present some information or make a decision. The educational domain serves as good example. Bachelor graduates often find themselves wondering if they should pursue a Master’s degree or start working in the industry. Finding pros and cons of each possibility is crucial for them in order to make up their mind. The Web is overloaded with data and it is growing constantly. It includes many arguments for topics in various fields but people are not satisfied anymore with traditional search engines that are supposed to find these arguments. Therefore, they look for more intelligent solutions and this is where argumentation mining comes in play. In this work we present a conceptual design of a system with the task to simplify the access to argumentation information concerning a specific topic. We propose to implement such a system as a search engine which looks for the arguments in the Web given a topic as a query. Because of the computation limitations we decide to concentrate only on topics from the educational domain and arguments in german language. We also implement and evaluate the critical parts of the system such as: a focused crawler, argument extraction and classification module as well as the front-end interface. For the extraction and classification part we decide to use supervised machine learning techniques. Therefore, first we collect the documents which contain the arguments. Secondly, we define the annotation scheme and perform the annotation study. As a result we create a labeled corpus, which is used for training models for the argument extraction and classification experiments. Finally, we evaluate the influence of different classification algorithms as well as the combination of different features and perform the error analysis.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[3]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[4]  J. Fleiss,et al.  Measuring Agreement for Multinomial Data , 1982 .

[5]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[6]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[7]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[10]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[11]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[12]  Lin Du,et al.  A framework for domain-specific search engine: design pattern perspective , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[13]  Chris Reed,et al.  Araucaria: Software for Argument Analysis, Diagramming and Representation , 2004, Int. J. Artif. Intell. Tools.

[14]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[15]  Marie-Francine Moens,et al.  Automatic detection of arguments in legal texts , 2007, ICAIL.

[16]  Robert Dale,et al.  Using Linguistic Phenomena to Motivate a Set of Rhetorical Relations , 2007 .

[17]  Chris Reed,et al.  Argumentation Schemes , 2008 .

[18]  D. Powers Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation , 2008 .

[19]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[20]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[21]  Timo Honkela,et al.  A Language-Independent Approach to Keyphrase Extraction and Evaluation , 2008, COLING.

[22]  Marie-Francine Moens,et al.  Argumentation mining: the detection, classification and structure of arguments in text , 2009, ICAIL.

[23]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[24]  Dietrich Klakow,et al.  The Role of Knowledge-based Features in Polarity Classification at Sentence Level , 2009, FLAIRS Conference.

[25]  Euripides G. M. Petrakis,et al.  Improving the performance of focused web crawlers , 2009, Data Knowl. Eng..

[26]  Hua Huang,et al.  Distributed search engine design and implementation based on Lucene , 2010, 2010 International Conference On Computer Design and Applications.

[27]  Albert Fornells,et al.  A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.

[28]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[29]  Shen Li,et al.  An improved topic relevance algorithm for focused crawling , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[30]  Vaishali Ganganwar,et al.  An overview of classification algorithms for imbalanced datasets , 2012 .

[31]  Saeedeh Momtazi,et al.  Fine-grained German Sentiment Analysis on Social Media , 2012, LREC.

[32]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[33]  Anna Kazantseva,et al.  Topical Segmentation: a Study of Human Performance and a New Measure of Quality , 2012, HLT-NAACL.

[34]  Peter Loos,et al.  Towards automated identification and analysis of argumentation structures in the decision corpus of the German Federal Constitutional Court , 2013, 2013 7th IEEE International Conference on Digital Ecosystems and Technologies (DEST).

[35]  Iryna Gurevych,et al.  Bringing Order to Digital Libraries: From Keyphrase Extraction to Index Term Assignment , 2013, D Lib Mag..

[36]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[37]  Vandana Jagtap,et al.  Analysis of different approaches to Sentence-Level Sentiment Classification , 2013 .

[38]  Bolette S. Pedersen,et al.  Annotation of regular polysemy and underspecification , 2013, ACL.

[39]  Renu Vig,et al.  Focused Crawling Based Upon Tf-Idf Semantics and Hub Score Learning , 2013 .

[40]  Chris Fournier,et al.  Evaluating Text Segmentation using Boundary Edit Distance , 2013, ACL.