Semi-supervised Learning for Affective Common-Sense Reasoning

BackgroundBig social data analysis is the area of research focusing on collecting, examining, and processing large multi-modal and multi-source datasets in order to discover patterns/correlations and extract information from the Social Web. This is usually accomplished through the use of supervised and unsupervised machine learning algorithms that learn from the available data. However, these are usually highly computationally expensive, either in the training or in the prediction phase, as they are often not able to handle current data volumes. Parallel approaches have been proposed in order to boost processing speeds, but this clearly requires technologies that support distributed computations.MethodsExtreme learning machines (ELMs) are an emerging learning paradigm, presenting an efficient unified solution to generalized feed-forward neural networks. ELM offers significant advantages such as fast learning speed, ease of implementation, and minimal human intervention. However, ELM cannot be easily parallelized, due to the presence of a pseudo-inverse calculation. Therefore, this paper aims to find a reliable method to realize a parallel implementation of ELM that can be applied to large datasets typical of Big Data problems with the employment of the most recent technology for parallel in-memory computation, i.e., Spark, designed to efficiently deal with iterative procedures that recursively perform operations over the same data. Moreover, this paper shows how to take advantage of the most recent advances in statistical learning theory (SLT) in order to address the issue of selecting ELM hyperparameters that give the best generalization performance. This involves assessing the performance of such algorithms (i.e., resampling methods and in-sample methods) by exploiting the most recent results in SLT and adapting them to the Big Data framework. The proposed approach has been tested on two affective analogical reasoning datasets. Affective analogical reasoning can be defined as the intrinsically human capacity to interpret the cognitive and affective information associated with natural language. In particular, we employed two benchmarks, each one composed by 21,743 common-sense concepts; each concept is represented according to two models of a semantic network in which common-sense concepts are linked to a hierarchy of affective domain labels.ResultsThe labeled data have been split into two sets: The first 20,000 samples have been used for building the model with the ELM with the different SLT strategies, while the rest of the labeled samples, numbering 1743, have been kept apart as reference set in order to test the performance of the learned model. The splitting process has been repeated 30 times in order to obtain statistically relevant results. We ran the experiments through the use of the Google Cloud Platform, in particular, the Google Compute Engine. We employed the Google Compute Engine Platform with NM = 4 machines with two cores and 1.8 GB of RAM (machine type n1-highcpu-2) and an HDD of 30 GB equipped with Spark. Results on the affective dataset both show the effectiveness of the proposed parallel approach and underline the most suitable SLT strategies for the specific Big Data problem.ConclusionIn this paper we showed how to build an ELM model with a novel scalable approach and to carefully assess the performance, with the use of the most recent results from SLT, for a sentiment analysis problem. Thanks to recent technologies and methods, the computational requirements of these methods have been improved to allow for the scaling to large datasets, which are typical of Big Data applications.

[1]  Sukomal Pal,et al.  Recent developments in social spam detection and combating techniques: A survey , 2016, Inf. Process. Manag..

[2]  M. Opper,et al.  Statistical mechanics of Support Vector networks. , 1998, cond-mat/9811421.

[3]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[4]  Davide Anguita,et al.  In-sample model selection for Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[5]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[6]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[7]  Qinyu. Zhu Extreme Learning Machine , 2013 .

[8]  Davide Anguita,et al.  In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Erik Cambria,et al.  The Hourglass of Emotions , 2011, COST 2102 Training School.

[10]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[11]  Gang Wang,et al.  Mining affective text to improve social media item recommendation , 2015, Inf. Process. Manag..

[12]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[13]  Fei Cheng,et al.  Facial Expression Recognition in JAFFE Dataset Based on Gaussian Process Classification , 2010, IEEE Transactions on Neural Networks.

[14]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[15]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[16]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  V. Ivanov,et al.  The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integr , 1978 .

[18]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[19]  Erik Cambria,et al.  AffectiveSpace 2: Enabling Affective Intuition for Concept-Level Sentiment Analysis , 2015, AAAI.

[20]  Kheireddine Abainia,et al.  Effective language identification of forum texts based on statistical approaches , 2016, Inf. Process. Manag..

[21]  François Laviolette,et al.  PAC-Bayesian Bounds based on the Rényi Divergence , 2016, AISTATS.

[22]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[23]  Christos Faloutsos,et al.  Semi-Supervised Learning Based on Semiparametric Regularization , 2008, SDM.

[24]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[25]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[26]  L. Kilian,et al.  In-Sample or Out-of-Sample Tests of Predictability: Which One Should We Use? , 2002, SSRN Electronic Journal.

[27]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[28]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[29]  Davide Anguita,et al.  The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers , 2011, NIPS.

[30]  Erik Cambria,et al.  Common Sense Knowledge for Handwritten Chinese Text Recognition , 2013, Cognitive Computation.

[31]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[32]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[33]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[34]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[35]  Haixun Wang,et al.  Guest Editorial: Big Social Data Analysis , 2014, Knowl. Based Syst..

[36]  François Laviolette,et al.  PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers , 2007, J. Mach. Learn. Res..

[37]  Davide Anguita,et al.  K-Fold Cross Validation for Error Rate Estimate in Support Vector Machines , 2009, DMIN.

[38]  Josef Steinberger,et al.  Supervised sentiment analysis in Czech social media , 2014, Inf. Process. Manag..

[39]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[40]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[41]  Davide Anguita,et al.  A Deep Connection Between the Vapnik–Chervonenkis Entropy and the Rademacher Complexity , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[42]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[43]  Alexandra Balahur,et al.  Sentiment analysis meets social media - Challenges and solutions of the field in view of the current information sharing context , 2015, Inf. Process. Manag..

[44]  Paolo Gastaldo,et al.  Inductive bias for semi-supervised extreme learning machine , 2016, Neurocomputing.

[45]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[46]  Davide Anguita,et al.  Unlabeled patterns to tighten Rademacher complexity error bounds for kernel classifiers , 2014, Pattern Recognit. Lett..

[47]  Bernard Widrow,et al.  New Trends of Learning in Computational Intelligence [Guest Editorial] , 2015, IEEE Comput. Intell. Mag..

[48]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[49]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[50]  Rishiraj Saha Roy,et al.  Syntactic complexity of Web search queries through the lenses of language models, networks and users , 2016, Inf. Process. Manag..

[51]  John Shawe-Taylor,et al.  Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[52]  Cheng Wu,et al.  Semi-Supervised and Unsupervised Extreme Learning Machines , 2014, IEEE Transactions on Cybernetics.

[53]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[54]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[55]  Erik Cambria,et al.  Sentic Computing for patient centered applications , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[56]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[57]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[58]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[59]  Alexander Binder,et al.  Theory and Algorithms for the Localized Setting of Learning Kernels , 2015, FE@NIPS.

[60]  Nada Lavrac,et al.  Active learning for sentiment analysis on data streams: Methodology and workflow implementation in the ClowdFlows platform , 2015, Inf. Process. Manag..

[61]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[62]  Davide Anguita,et al.  Local Rademacher Complexity: Sharper risk bounds with and without unlabeled samples , 2015, Neural Networks.

[63]  Paolo Gastaldo,et al.  An ELM-based model for affective analogical reasoning , 2015, Neurocomputing.

[64]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[65]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[66]  Davide Anguita,et al.  Fully Empirical and Data-Dependent Stability-Based Bounds , 2015, IEEE Transactions on Cybernetics.

[67]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[68]  Johan A. K. Suykens,et al.  Morozov, Ivanov and Tikhonov Regularization Based LS-SVMs , 2004, ICONIP.

[69]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[70]  Davide Anguita,et al.  Global Rademacher Complexity Bounds: From Slow to Fast Convergence Rates , 2015, Neural Processing Letters.

[71]  Elisabetta Fersini,et al.  Expressive signals in social media languages to improve polarity detection , 2016, Inf. Process. Manag..

[72]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[73]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[74]  Isabelle Guyon,et al.  Model Selection: Beyond the Bayesian/Frequentist Divide , 2010, J. Mach. Learn. Res..

[75]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[76]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[77]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[78]  Hongming Zhou,et al.  Extreme Learning Machines [Trends & Controversies] , 2013 .

[79]  François Laviolette,et al.  PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , 2006, NIPS.

[80]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[81]  Erik Cambria,et al.  Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis , 2015 .

[82]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[83]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[84]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[85]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[86]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[87]  Guang-Bin Huang,et al.  What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle , 2015, Cognitive Computation.

[88]  R. Serfling Probability Inequalities for the Sum in Sampling without Replacement , 1974 .

[89]  Erik Cambria,et al.  Fusing audio, visual and textual clues for sentiment analysis from multimodal content , 2016, Neurocomputing.

[90]  Chi-Man Vong,et al.  Local Receptive Fields Based Extreme Learning Machine , 2015, IEEE Computational Intelligence Magazine.

[91]  Malik Magdon-Ismail,et al.  No Free Lunch for Noise Prediction , 2000, Neural Computation.

[92]  Bernhard Schölkopf,et al.  The representer theorem for Hilbert spaces: a necessary and sufficient condition , 2012, NIPS.

[93]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[94]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[95]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[96]  Yong Qi,et al.  Information Processing and Management , 1984 .

[97]  Harith Alani,et al.  Contextual semantics for sentiment analysis of Twitter , 2016, Inf. Process. Manag..

[98]  Victor C. M. Leung,et al.  Extreme Learning Machines [Trends & Controversies] , 2013, IEEE Intelligent Systems.

[99]  Jason Jianjun Gu,et al.  An Efficient Method for Traffic Sign Recognition Based on Extreme Learning Machine , 2017, IEEE Transactions on Cybernetics.

[100]  John Langford,et al.  Computable Shell Decomposition Bounds , 2000, J. Mach. Learn. Res..

[101]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[102]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[103]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[104]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[105]  Davide Anguita,et al.  In-sample Model Selection for Trimmed Hinge Loss Support Vector Machine , 2012, Neural Processing Letters.

[106]  T. Poggio,et al.  STABILITY RESULTS IN LEARNING THEORY , 2005 .

[107]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..