A subjectivity classification framework for sports articles using improved cortical algorithms

AbstractThe enormous number of articles published daily on the Internet, by a diverse array of authors, often offers misleading or unwanted information, rendering activities such as sports betting riskier. As a result, extracting meaningful and reliable information from these sources becomes a time-consuming and near impossible task. In this context, labeling articles as objective or subjective is not a simple natural language processing task because subjectivity can take several forms. With the rise of online sports betting due to the revolution in Internet and mobile technology, an automated system capable of sifting through all these data and finding relevant sources in a reasonable amount of time presents itself as a desirable and marketable product. In this work, we present a framework for the classification of sports articles composed of three stages: The first stage extracts articles from web pages using text extraction libraries, parses the text and then tags words using Stanford’s parts of speech tagger; the second stage extracts unique syntactic and semantic features, and reduces them using our modified cortical algorithm (CA)—hereafter CA*—while the third stage classifies these texts as objective or subjective. Our framework was tested on a database containing 1000 articles, manually labeled using Amazon’s crowdsourcing tool, Mechanical Turk; and results using CA, CA*, support vector machines and one of its soft computing variants (LMSVM) as classifiers were reported. A testing accuracy of 85.6% was achieved on a fourfold cross-validation with a 40% reduction in features using CA* that was trained using an entropy weight update rule and a cross-entropy cost function.

[1]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[2]  Xin Yao,et al.  Linear dimensionality reduction using relevance weighted LDA , 2005, Pattern Recognit..

[3]  Rainer Goebel,et al.  "Who" Is Saying "What"? Brain-Based Decoding of Human Voice and Speech , 2008, Science.

[4]  Shrikanth S. Narayanan,et al.  A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle , 2012, ACL.

[5]  Mariette Awad,et al.  A local mixture based SVM for an efficient supervised binary classification , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[6]  Hsinchun Chen,et al.  Selecting Attributes for Sentiment Classification Using Feature Relation Networks , 2011, IEEE Transactions on Knowledge and Data Engineering.

[7]  Bin Tang,et al.  Document Representation and Dimension Reduction for Text Clustering , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[8]  Stavros J. Perantonis,et al.  Input Feature Extraction for Multilayered Perceptrons Using Supervised Principal Component Analysis , 1999, Neural Processing Letters.

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  K. Eswaran,et al.  Automatic Pattern Classification by Unsupervised Learning Using Dimensionality Reduction of Data with Mirroring Neural Networks , 2007, ArXiv.

[11]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[12]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[13]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[14]  Uzay Kaymak,et al.  Analyzing Sentiment in a Large Set of Web Data While Accounting for Negation , 2011, AWIC.

[15]  Flavius Frasincar,et al.  Sentiment Lexicon Creation from Lexical Resources , 2011, BIS.

[16]  J. S. Barlow The mindful brain: B.M. Edelman and V.B. Mountcastle (MIT Press, Cambridge, Mass., 1978, 100 p., U.S. $ 10.00) , 1979 .

[17]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[18]  Luís A. Alexandre,et al.  Data classification with multilayer perceptrons using a generalized error function , 2008, Neural Networks.

[19]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[20]  Diego Reforgiato Recupero,et al.  Sentiment Analysis: Adjectives and Adverbs are Better than Adjectives Alone , 2007, ICWSM.

[21]  Dacheng Tao,et al.  Max-Min Distance Analysis by Using Sequential SDP Relaxation for Dimension Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Luís A. Alexandre,et al.  Neural network classification using Shannon's entropy , 2005, ESANN.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  David M. W. Powers,et al.  Evolutionary feature selection and electrode reduction for EEG classification , 2012, 2012 IEEE Congress on Evolutionary Computation.

[25]  Yumin Chen,et al.  A rough set approach to feature selection based on power set tree , 2011, Knowl. Based Syst..

[26]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[27]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .

[28]  Nadine Hajj,et al.  Weighted entropy cortical algorithms for isolated Arabic speech recognition , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[29]  Tejashri Inadarchand Jain,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2010 .

[30]  Wei-Ying Ma,et al.  An Evaluation on Feature Selection for Text Clustering , 2003, ICML.

[31]  Yue Han,et al.  A Variance Reduction Framework for Stable Feature Selection , 2010, 2010 IEEE International Conference on Data Mining.

[32]  Virgílio A. F. Almeida,et al.  From bias to opinion: a transfer-learning approach to real-time sentiment analysis , 2011, KDD.

[33]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[34]  Mikko H. Lipasti,et al.  Cortical columns: Building blocks for intelligent systems , 2009, 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing.

[35]  Mikko H. Lipasti,et al.  Discovering Cortical Algorithms , 2018, IJCCI.

[36]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[37]  Michael I. Jordan,et al.  Unsupervised Kernel Dimension Reduction , 2010, NIPS.

[38]  Hyunsoo Kim,et al.  Dimension Reduction in Text Classification with Support Vector Machines , 2005, J. Mach. Learn. Res..

[39]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[40]  Ellen Riloff,et al.  Finding Mutual Benefit between Subjectivity Analysis and Information Extraction , 2011, IEEE Transactions on Affective Computing.

[41]  Krishnakumar Balasubramanian,et al.  Dimensionality Reduction for Text using Domain Knowledge , 2010, COLING.

[42]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.