Difference-Similitude Matrix in Text Classification

Text classification can greatly improve the performance of information retrieval and information filtering, but high dimensionality of documents baffles the applications of most classification approaches. This paper proposed a Difference-Similitude Matrix (DSM) based method to solve the problem. The method represents a pre-classified collection as an item-document matrix, in which documents in same categories are described with similarities while documents in different categories with differences. Using the DSM reduction algorithm, simpler and more efficient than rough set reduction, we reduced the dimensionality of document space and generated rules for text classification.

[1]  Harvey Armstrong,et al.  CONVERGENCE AMONG REGIONS OF THE EUROPEAN UNION, 1950–1990 , 2005 .

[2]  Hang Li,et al.  Text classification using ESC-based stochastic decision lists , 1999, CIKM '99.

[3]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[4]  Hao Jiang,et al.  A new reduction algorithm - difference-similitude matrix , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[5]  Damien J. Neven,et al.  Regional Convergence in the European Comunity , 1995 .

[6]  Paul Conway,et al.  Product Market Regulation in OECD Countries: 1998 to 2003 , 2005 .

[7]  G. Arbia Spatial Econometrics , 2006, Encyclopedia of Big Data.

[8]  L. Anselin Spatial Econometrics: Methods and Models , 1988 .

[9]  Catherine Baumont,et al.  The European Regional Convergence Process, 1980-1995: Do Spatial Regimes and Spatial Dependence Matter? , 2006 .

[10]  Danny Quah,et al.  Regional convergence clusters across Europe , 1996 .

[11]  Yixin Chen,et al.  Support vector learning for fuzzy rule-based classification systems , 2003, IEEE Trans. Fuzzy Syst..

[12]  Arjan Lejour,et al.  The free movement of services within the EU , 2004 .

[13]  T. Mayer,et al.  Je T'Aime, Moi Non Plus: Bilateral Opinions and International Trade , 2005 .

[14]  Bernard Fingleton,et al.  Empirical growth models with spatial effects , 2006 .

[15]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[16]  Jing Wu,et al.  Incremental machine learning theorem and algorithm based on DSM method , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[17]  Jordi Suriñach,et al.  Regional economic dynamics and convergence in the European Union , 1999 .

[18]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[19]  Stephen F. Knack,et al.  INSTITUTIONS AND ECONOMIC PERFORMANCE: CROSS‐COUNTRY TESTS USING ALTERNATIVE INSTITUTIONAL MEASURES , 1995 .

[20]  Sebastian Thrun,et al.  Using EM to Classify Text from Labeled and Unlabeled Documents , 1998 .

[21]  A. L. Barker,et al.  Selection of distance metrics and feature subsets for K-nearest neighbor classifiers , 1997 .

[22]  Zdzislaw Pawlak,et al.  Rough classification , 1984, Int. J. Hum. Comput. Stud..

[23]  Akiko Aizawa The feature quantity: an information theoretic perspective of Tfidf-like measures , 2000, SIGIR '00.

[24]  Henri L. F. de Groot,et al.  Space and Growth: A Survey of Empirical Evidence and Methods , 2004 .

[25]  Harald Badinger,et al.  Regional Convergence in the European Union, 1985- 1999: A Spatial Dynamic Panel Analysis , 2004 .

[26]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[27]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[28]  Hung Son Nguyen,et al.  Scalable Classification Method Based on Rough Sets , 2002, Rough Sets and Current Trends in Computing.

[29]  Cem Ertur,et al.  Regional disparities in the European Union and the enlargement process: an exploratory spatial data analysis, 1995–2000 , 2006 .

[30]  Pu-Liu Yan,et al.  A new knowledge reduction method based on difference-similitude set theory , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[31]  R. Florax,et al.  A Meta-Analysis of Beta- Convergence , 2005 .

[32]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[33]  Luc Anselin,et al.  Spatial Externalities, Spatial Multipliers, And Spatial Econometrics , 2003 .

[34]  D. Weil,et al.  A Contribution to the Empirics of Economic Growth Author ( s ) : , 2008 .

[35]  Aart Kraay,et al.  Growth without Governance , 2002 .

[36]  S. Dall’erba,et al.  Evaluating the Temporal and Spatial Heterogeneity of the European Convergence Process, 1980-1999 , 2006 .

[37]  Stephen F. Knack,et al.  Does Social Capital Have an Economic Payoff? A Cross-Country Investigation , 1997 .

[38]  Cem Ertur,et al.  Exploratory spatial data analysis of the distribution of regional per capita GDP in Europe, 1980–1995 , 2003 .

[39]  Cem Ertur,et al.  Exploratory spatial data analysis of the distribution of regional per capita GDP in Europe, 1980−1995 , 2000 .