Feature Selection for Ordinal Text Classification

Ordinal classification (also known as ordinal regression) is a supervised learning task that consists of estimating the rating of a data item on a fixed, discrete rating scale. This problem is receiving increased attention from the sentiment analysis and opinion mining community due to the importance of automatically rating large amounts of product review data in digital form. As in other supervised learning tasks such as binary or multiclass classification, feature selection is often needed in order to improve efficiency and avoid overfitting. However, although feature selection has been extensively studied for other classification tasks, it has not for ordinal classification. In this letter, we present six novel feature selection methods that we have specifically devised for ordinal classification and test them on two data sets of product review data against three methods previously known from the literature, using two learning algorithms from the support vector regression tradition. The experimental results show that all six proposed metrics largely outperform all three baseline techniques (and are more stable than these others by an order of magnitude), on both data sets and for both learning algorithms.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[3]  Umberto Straccia,et al.  Web metasearch: rank vs. score based rank aggregation methods , 2003, SAC '03.

[4]  Tao Qin,et al.  Feature selection for ranking , 2007, SIGIR.

[5]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[6]  A. Atkinson Subset Selection in Regression , 1992 .

[7]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[8]  Kazutaka Shimada,et al.  Seeing Several Stars: A Rating Inference Task for a Document Containing Several Evaluation Criteria , 2008, PAKDD.

[9]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[10]  Sutanu Chakraborti,et al.  Information Gain Feature Selection for Ordinal Text Classification using Probability Re-distribution , 2007 .

[11]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Panayiotis Bozanis,et al.  Effective rank aggregation for metasearching , 2011, J. Syst. Softw..

[13]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[14]  Andrea Esuli,et al.  Using Micro-Documents for Feature Selection: The Case of Ordinal Text Classification , 2013, IIR.

[15]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[16]  Jennifer G. Dy Unsupervised Feature Selection , 2007 .

[17]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[18]  Qinghua Hu,et al.  Feature Selection for Monotonic Classification , 2012, IEEE Transactions on Fuzzy Systems.

[19]  George Forman Feature Selection for Text Classification , 2007 .

[20]  Xiaojin Zhu,et al.  Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization , 2006 .

[21]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[22]  Andrea Esuli,et al.  Evaluation Measures for Ordinal Text Classification , 2009 .

[23]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[24]  Shotaro Akaho,et al.  Dimension Reduction for Supervised Ordering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[25]  R. Powell ReviewHandbook of Research Design and Social Measurement, 6th ed.: by Delbert C. Miller and Neil J. Salkind. Thousand Oaks, CA: Sage Publications, 2002. 786 pp. $145.00 (hardcover); $69.95 (paper). ISBN 0-7619-2046-3 (paper) , 2004 .

[26]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[27]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[28]  Andrea Esuli,et al.  Evaluation Measures for Ordinal Regression , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[29]  Andrea Esuli,et al.  Feature selection for ordinal regression , 2010, SAC '10.

[30]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[31]  Andrea Esuli,et al.  Multi-Faceted Rating of Product Reviews , 2009, ERCIM News.

[32]  L. Davis Handbook of Research Design and Social Measurement , 1992 .

[33]  Wei Chu,et al.  Support Vector Ordinal Regression , 2007, Neural Computation.

[34]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[35]  Peter Tiño,et al.  Adaptive Metric Learning Vector Quantization for Ordinal Classification , 2012, Neural Computation.

[36]  Bing Liu,et al.  Review spam detection , 2007, WWW '07.

[37]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[38]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[39]  A. J. Feelders,et al.  Classification trees for problems with monotonicity constraints , 2002, SKDD.

[40]  George Forman,et al.  A pitfall and solution in multi-class feature selection for text classification , 2004, ICML.

[41]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[42]  Wei-Ying Ma,et al.  An Evaluation on Feature Selection for Text Clustering , 2003, ICML.

[43]  András Kornai,et al.  Mathematical Linguistics , 2007, Advanced Information and Knowledge Processing.