Short Text Classification of Buyer-Initiated Questions in Online Auctions: A Score Assigning Method

Classification of short text (SMS, reviews, feedback, etc.) presents a unique set of challenges compared to classic text classification. Short texts are characterized by cryptic constructions, poor spelling, improper grammar, etc. that makes the application of traditional methods difficult. Proper classification enables us to use this information for further action. We study this problem in the context of online auctions. The paper presents a score assigning approach which outperforms traditional methods (e.g. Naive Bayes) in terms of accuracy.

[1]  Martínez Guardado,et al.  Automatic document classification , 2017 .

[2]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[3]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[4]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[5]  W. Bruce Croft,et al.  Automatic Assignment of ICD9 Codes To Discharge Summaries , 1995 .

[6]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[7]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[9]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .

[10]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[11]  Gordon V. Cormack,et al.  Online supervised spam filter evaluation , 2007, TOIS.

[12]  James P. Callan,et al.  Automatic discovery of language models for text databases , 1999, SIGMOD '99.

[13]  Shouning Qu,et al.  Short Text Classification Based on Improved ITC , 2013 .

[14]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[15]  W. B. Cavnar,et al.  Using An N-Gram-Based Document Representation With A Vector Processing Retrieval Model , 1994, TREC.

[16]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[17]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[18]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[19]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[20]  Aixin Sun,et al.  Short text classification using very few words , 2012, SIGIR '12.

[21]  Dennis McLeod,et al.  A Comparative Study for Email Classification , 2007 .

[22]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..

[23]  Yunming Ye,et al.  Mining Textual Stream with Partial Labeled Instances Using Ensemble Framework , 2014 .

[24]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[25]  Houkuan Huang,et al.  Feature selection for text classification with Naïve Bayes , 2009, Expert Syst. Appl..

[26]  Shui-Lung Chuang,et al.  Enriching Web taxonomies through subject categorization of query terms from search engine logs , 2003, Decis. Support Syst..

[27]  Douglas W. Oard,et al.  Textual Data Mining to Support Science and Technology Management , 2000, Journal of Intelligent Information Systems.