Using Micro-Documents for Feature Selection: The Case of Ordinal Text Classification

Most popular feature selection methods for text classification such as information gain (also known as ''mutual information''), chi-square, and odds ratio, are based on binary information indicating the presence/absence of the feature (or ''term'') in each training document. As such, these methods do not exploit a rich source of information, namely, the information concerning how frequently the feature occurs in the training document (term frequency). In order to overcome this drawback, when doing feature selection we logically break down each training document of length k into k training ''micro-documents'', each consisting of a single word occurrence and endowed with the same class information of the original training document. This move has the double effect of (a) allowing all the original feature selection methods based on binary information to be still straightforwardly applicable, and (b) making them sensitive to term frequency information. We study the impact of this strategy in the case of ordinal text classification, a type of text classification dealing with classes lying on an ordinal scale, and recently made popular by applications in customer relationship management, market research, and Web 2.0 mining. We run experiments using four recently introduced feature selection functions, two learning methods of the support vector machines family, and two large datasets of product reviews. The experiments show that the use of this strategy substantially improves the accuracy of ordinal text classification.

[1]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[2]  Masahiko Haruno,et al.  Feature Selection in SVM Text Categorization , 1999, AAAI/IAAI.

[3]  Andrea Esuli,et al.  Feature selection for ordinal regression , 2010, SAC '10.

[4]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[5]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[6]  Mohammad Mehdi Homayounpour,et al.  Improving Farsi multiclass text classification using a thesaurus and two‐stage feature selection , 2011, J. Assoc. Inf. Sci. Technol..

[7]  SebastianiFabrizio,et al.  Using micro-documents for feature selection , 2013 .

[8]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[9]  Bing Liu,et al.  Review spam detection , 2007, WWW '07.

[10]  Kazutaka Shimada,et al.  Seeing Several Stars: A Rating Inference Task for a Document Containing Several Evaluation Criteria , 2008, PAKDD.

[11]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[12]  Wei Chu,et al.  Support Vector Ordinal Regression , 2007, Neural Computation.

[13]  Alan J. Miller Subset Selection in Regression , 1992 .

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[16]  Saket S. R. Mengle,et al.  Ambiguity measure feature-selection algorithm , 2009, J. Assoc. Inf. Sci. Technol..

[17]  Andrea Esuli,et al.  Evaluation Measures for Ordinal Regression , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[18]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[19]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[20]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[21]  Andrea Esuli,et al.  Multi-Faceted Rating of Product Reviews , 2009, ERCIM News.

[22]  Sutanu Chakraborti,et al.  Information Gain Feature Selection for Ordinal Text Classification using Probability Re-distribution , 2007 .

[23]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[24]  A. Atkinson Subset Selection in Regression , 1992 .

[25]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[26]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[27]  Andrea Esuli,et al.  Evaluation Measures for Ordinal Text Classification , 2009 .