Approach for Text Classification Based on the Similarity Measurement between Normal Cloud Models

The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC) is proposed. It can efficiently accomplish conversion between qualitative concept and quantitative data. Through the conversion from text set to text information table based on VSM model, the text qualitative concept, which is extraction from the same category, is jumping up as a whole category concept. According to the cloud similarity between the test text and each category concept, the test text is assigned to the most similar category. By the comparison among different text classifiers in different feature selection set, it fully proves that not only does CCJU-TC have a strong ability to adapt to the different text features, but also the classification performance is also better than the traditional classifiers.

[1]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[2]  Ee-Peng Lim,et al.  On strategies for imbalanced text classification using SVM: A comparative study , 2009, Decis. Support Syst..

[3]  Rémi Gilleron,et al.  Learning Multi-label Alternating Decision Trees from Texts and Data , 2003, MLDM.

[4]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[5]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[6]  Deyi Li,et al.  Artificial Intelligence with Uncertainty , 2004, CIT.

[7]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[8]  Andrei Z. Broder,et al.  Robust classification of rare queries using web knowledge , 2007, SIGIR.

[9]  Dino Isa,et al.  High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic , 2012, Expert Syst. Appl..

[10]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[11]  Bo Yu,et al.  Latent semantic analysis for text categorization using neural network , 2008, Knowl. Based Syst..

[12]  Madan Gopal,et al.  A comparison study on multiple binary-class SVM methods for unilabel text categorization , 2010, Pattern Recognit. Lett..

[13]  Li De,et al.  Artificial Intelligence with Uncertainty , 2004 .

[14]  Padmini Srinivasan,et al.  Hierarchical Text Categorization Using Neural Networks , 2004, Information Retrieval.

[15]  Li Deyi,et al.  A Collaborative Filtering Recommendation Algorithm Based on Cloud Model , 2007 .

[16]  Shengyi Jiang,et al.  An improved K-nearest-neighbor algorithm for text categorization , 2012, Expert Syst. Appl..

[17]  Tai-Yue Wang,et al.  One-against-one fuzzy support vector machine classifier: An approach to text categorization , 2009, Expert Syst. Appl..

[18]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[19]  LiDeyi,et al.  Study on the Universality of the Normal Cloud Model , 2005 .

[20]  Guy W. Mineau,et al.  Feature Selection Strategies for Text Categorization , 2003, Canadian Conference on AI.

[21]  Xu Xin,et al.  Advances in Machine Learning Based Text Categorization , 2006 .

[22]  Wang Guo-yin Research and application of text classification based on incomplete information system , 2006 .

[23]  Feng Hu,et al.  A High Performance Algorithm for Text Feature Automatic Selection , 2009 .

[24]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.