A new hybrid semi-supervised algorithm for text classification with class-based semantics

Vector Space Models (VSM) are commonly used in language processing to represent certain aspects of natural language semantics. Semantics of VSM comes from the distributional hypothesis, which states that words that occur in similar contexts usually have similar meanings. In our previous work, we proposed novel semantic smoothing kernels based on classspecific transformations. These kernels use class-term matrices, which can be considered as a new type of VSM. By using the class as the context, these methods can extract class specific semantics by making use of word distributions both in documents and in different classes. In this study, we adapt two of these semantic classification approaches to build a novel and high performance semi-supervised text classification algorithm. These approaches include Helmholtz principle based calculation of term meanings in the context of classes for initial classification and a supervised term weighting based semantic kernel with Support Vector Machines (SVM) for the final classification model. The approach used in the first phase is especially good at learning with very small datasets, while the approach in the second phase is specifically good at eliminating noise in a relatively large and noisy training sets when building a classification model. Overall, as a semantic semi-supervised learning algorithm, our approach can effectively utilize abundant source of unlabeled instances to improve the classification accuracy significantly especially when the amount of labeled instances are limited.

[1]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[2]  Peter Wittek,et al.  A kernel-based feature weighting for text classification , 2009, 2009 International Joint Conference on Neural Networks.

[3]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[4]  Erik Cambria,et al.  AffectiveSpace 2: Enabling Affective Intuition for Concept-Level Sentiment Analysis , 2015, AAAI.

[5]  Erik Cambria,et al.  Sentic Computing: Exploitation of Common Sense for the Development of Emotion-Sensitive Systems , 2009, COST 2102 Training School.

[6]  Xiaobo Liu,et al.  Instance Selection in Semi-supervised Learning , 2011, Canadian Conference on AI.

[7]  Chengwei Huang,et al.  A Semi-Supervised Learning Algorithm Based on Modified Self-training SVM , 2011, J. Comput..

[8]  Erik Cambria,et al.  Sentic Computing for social media marketing , 2012, Multimedia Tools and Applications.

[9]  Fabrizio Sebastiani,et al.  Supervised term weighting for automated text categorization , 2003, SAC '03.

[10]  Carlotta Domeniconi,et al.  Building semantic kernels for text classification using wikipedia , 2008, KDD.

[11]  Nizar Grira,et al.  Unsupervised and Semi-supervised Clustering : a Brief Survey ∗ , 2004 .

[12]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[13]  Shi Bing,et al.  Inductive learning algorithms and representations for text categorization , 2006 .

[14]  Catherine Havasi,et al.  Representing General Relational Knowledge in ConceptNet 5 , 2012, LREC.

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[17]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[18]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[19]  Thomas Hofmann,et al.  Semi-supervised Learning on Directed Graphs , 2004, NIPS.

[20]  Fabio Gagliardi Cozman,et al.  Semi-supervised Learning of Classifiers : Theory , Algorithms and Their Application to Human-Computer Interaction , 2004 .

[21]  Bin Wang,et al.  Semi-supervised Self-training for Sentence Subjectivity Classification , 2008, Canadian Conference on AI.

[22]  Murat Can Ganiz,et al.  Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification , 2014, Journal of Computer Science and Technology.

[23]  Zhi-Hua Zhou,et al.  Cost-Sensitive Semi-Supervised Support Vector Machine , 2010, AAAI.

[24]  E. Cambria,et al.  AffectiveSpace: Blending Common Sense and Affective Knowledge to Perform Emotive Reasoning , 2009 .

[25]  Nicu Sebe,et al.  Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[27]  Steven J. Simske,et al.  Rapid change detection and text mining , 2011 .

[28]  Erik T. Mueller,et al.  Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.

[29]  T. Theeramunkong,et al.  Analysis of inverse class frequency in centroid-based text classification , 2004, IEEE International Symposium on Communications and Information Technology, 2004. ISCIT 2004..

[30]  Banu Diri,et al.  A simple semantic kernel approach for SVM using higher-order paths , 2014, 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings.

[31]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[32]  Erik Cambria,et al.  SenticNet: A Publicly Available Semantic Resource for Opinion Mining , 2010, AAAI Fall Symposium: Commonsense Knowledge.

[33]  William M. Pottenger,et al.  Leveraging Higher Order Dependencies Between Features for Text Classification , 2009 .

[34]  Banu Diri,et al.  A novel higher-order semantic kernel for text classification , 2013, 2013 International Conference on Electronics, Computer and Computation (ICECCO).

[35]  Goo Jun,et al.  A self-training approach to cost sensitive uncertainty sampling , 2009, Machine Learning.

[36]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[37]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[38]  Florence d'Alché-Buc,et al.  Support Vector Machines based on a semantic kernel for text categorization , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[39]  Jian Su,et al.  Supervised and Traditional Term Weighting Methods for Automatic Text Categorization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Erik Cambria,et al.  Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis , 2015 .

[41]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[42]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[43]  Murat Can Ganiz,et al.  A corpus-based semantic kernel for text classification by using meaning values of terms , 2015, Eng. Appl. Artif. Intell..

[44]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[45]  Steven J. Simske,et al.  Document sentences as a small world , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[46]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[47]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[48]  Erik Cambria,et al.  SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis , 2014, AAAI.

[49]  Murat Can Ganiz,et al.  A Novel Semantic Smoothing Method Based on Higher Order Paths for Text Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[50]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[51]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[52]  Steven J. Simske,et al.  On the Helmholtz Principle for Data Mining , 2012, 2012 Third International Conference on Emerging Security Technologies.

[53]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models , 2003, ICML.

[54]  Erik Cambria,et al.  Common Sense Computing: From the Society of Mind to Digital Intuition and beyond , 2009, COST 2101/2102 Conference.

[55]  Banu Diri,et al.  A novel semantic smoothing kernel for text classification with class-based weighting , 2015, Knowl. Based Syst..

[56]  Harry Zhang,et al.  An Extensive Empirical Study on Semi-supervised Learning , 2010, 2010 IEEE International Conference on Data Mining.

[57]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[58]  Steven J. Simske,et al.  On helmholtz's principle for documents processing , 2010, DocEng '10.

[59]  Wei Zhang,et al.  A Novel Semi-Supervised SVM Based on Tri-Training , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[60]  Banu Diri,et al.  Abstract feature extraction for text classification , 2012, Turkish Journal of Electrical Engineering and Computer Sciences.

[61]  Zhi-Hua Zhou,et al.  SETRED: Self-training with Editing , 2005, PAKDD.

[62]  Stephan Bloehdorn,et al.  Semantic Kernels for Text Classification Based on Topological Measures of Feature Similarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[63]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[64]  Murat Can Ganiz,et al.  A novel classifier based on meaning for text classification , 2015, 2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA).

[65]  Banu Diri,et al.  A new method for attribute extraction with application on text classification , 2009, 2009 Fifth International Conference on Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control.

[66]  Youngjoong Ko,et al.  Automatic Text Categorization by Unsupervised Learning , 2000, COLING.

[67]  Shiwei Tang,et al.  A Comparative Study on Feature Weight in Text Categorization , 2004, APWeb.

[68]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[69]  William M. Pottenger,et al.  A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..

[70]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[71]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[72]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[73]  Banu Diri,et al.  A Semantic Kernel for Text Classification Based on Iterative Higher-Order Relations between Words and Documents , 2014, ICAISC.

[74]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[75]  Erik Cambria,et al.  Towards Crowd Validation of the UK National Health Service , 2010 .