A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification

Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. In this paper, we propose a fuzzy similarity-based self-constructing algorithm for feature clustering. The words in the feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each other are grouped into the same cluster. Each cluster is characterized by a membership function with statistical mean and deviation. When all the words have been fed in, a desired number of clusters are formed automatically. We then have one extracted feature for each cluster. The extracted feature, corresponding to a cluster, is a weighted combination of the words contained in the cluster. By this algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data. Besides, the user need not specify the number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted features can then be avoided. Experimental results show that our method can run faster and obtain better extracted features than other methods.

[1]  Dino Ienco,et al.  Exploration and Reduction of the Feature Space by Hierarchical Clustering , 2008, SDM.

[2]  Óscar W. Márquez Flórez,et al.  Experimental Results of the Signal Processing Approach to Distributional Clustering of Terms on Reuters-21578 Collection , 2007, ECIR.

[3]  Weiguo Fan,et al.  Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing , 2006, IEEE Transactions on Knowledge and Data Engineering.

[4]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[7]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  Jeen-Shing Wang,et al.  Self-adaptive neuro-fuzzy inference systems for classification applications , 2002, IEEE Trans. Fuzzy Syst..

[10]  J. B. Rosen,et al.  Lower Dimensional Representation of Text Data Based on Centroids and Least Squares , 2003 .

[11]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[12]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[13]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[14]  Daoqiang Zhang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[15]  Hua Li,et al.  IMMC: incremental maximum margin criterion , 2004, KDD.

[16]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[17]  José Ranilla,et al.  Introducing a family of linear measures for feature selection in text categorization , 2005, IEEE Transactions on Knowledge and Data Engineering.

[18]  Hyunsoo Kim,et al.  Dimension Reduction in Text Classification with Support Vector Machines , 2005, J. Mach. Learn. Res..

[19]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[20]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[21]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[22]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[23]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[24]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[25]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[27]  Juyang Weng,et al.  Candid Covariance-Free Incremental Principal Component Analysis , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Hisham Al-Mubaid,et al.  A New Text Categorization Technique Using Distributional Clustering and Learning Logic , 2006, IEEE Transactions on Knowledge and Data Engineering.

[29]  Ran El-Yaniv,et al.  Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..

[30]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[31]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[32]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[33]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[34]  Hiroshi Mizoguchi,et al.  Successive learning of linear discriminant analysis: Sanger-type algorithm , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[35]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[36]  Naftali Tishby,et al.  The Power of Word Clusters for Text Classification , 2006 .

[37]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[38]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[39]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.