An Application on Text Classification Based on Granular Computing

ABSTRACT Machine learning is the key to text classification, a granular computing approach to machine learning is applied to learning classification rules by considering the two basic issues: concept formation and concept relationships identification. In this paper, we concentrate on the selection of a single granule in each step to construct a granule network. A classification rule induction method is proposed. INTRODUCTION Knowledge discovery and data mining are frequently applied as a process extracting interesting information or patterns from large databases. It is actually a technique or program which to do automatic inductive reasoning learning, identification and searching for knowledge, patterns, and regularities from data. Knowledge Discovery in Text (KDT), which uses Text Mining techniques to extract and induce hidden knowledge from unstructured text data, surges in the data and natural language processing research. KDT is a multidiscipline of Artificial Intelligence, machine learning with a stressing on its IE (Information Extraction)-based induction and specific fields practices. Text classification is one of the practices of KDT. Classification is one of the well studied problems in machine learning and data mining as it involves of discovery knowledge. Text classification is the task of deciding whether a piece of text belongs to any of a set of pre-specified categories. Automatic text classification includes the next six steps: set up data set (training data set and testing data set), text knowledge indexing, feature (key words) extracting and selecting and feature, design a classifier through machine learning, test the classifier with testing set, evaluate the method. Among these steps, designation of a classifier is the most important. Granular computing (GrC) is an umbrella term which covers any theories, methodologies, techniques, and tools that make use of granules (i.e., subsets of a universe) in problem solving. A subset of the universe is called a granule in granular computing. Basic ingredients of granular computing are subsets, classes, and clusters of a universe. It deals with the characterization of a concept by a unit of thoughts consisting of two parts, the intension and extension of the concept. In the sight of the fact, Y.Y. Yao presented a granular computing view to classification problems and proposed a granular computing approach to classification. He provided a modeling data mining with granular computing to resolve classification problem. This paper put this model to an application on text classification based on this model of granular computing. GRANULAR COMPUTING BASIC FOR CONSISTENT CLASSIFICATION PROBLEMS There are two aspects of a concept, the intension and extension of the concept. In the granular computing model for knowledge discovery, data mining, and classification, a set of objects are represented using an information table. The intension of a concept is expressed by a formula of the language, while the extension of a concept is represented as the set of objects satisfying the formula. This formulation enables us to study formal concepts in a logic setting in terms of intensions and also in a set theoretic setting in terms of extensions. Representation of granular In order to formalize the problem, an information table was introduced in. An information table can be formulated as a tuple: S = (U, At, L {[V.sub.a]| a [member of] At}, {[I.sub.a]|a [member of] At}) (1) Where U is a finite nonempty set of objects, At is a finite nonempty set of attributes, L is a language defined using attributes in At, [V.sub.a]is a nonempty set of values for a [member of] At, [I.sub.a] :U [right arrow] [V.sub.a] is an information function. In the language L, an atomic formula is defined as a=v, where At a [member of] At and v [member of] [V. …

[1]  Yiyu Yao,et al.  A Granular Computing Approach to Machine Learning , 2002, FSKD.

[2]  Yiyu Yao,et al.  Interactive classification using a granule network , 2005, Fourth IEEE Conference on Cognitive Informatics, 2005. (ICCI 2005)..

[3]  Qing Liu,et al.  Granular computing based text classification , 2006, 2006 IEEE International Conference on Granular Computing.

[4]  Ah-Hwee Tan,et al.  A Comparative Study on Chinese Text Categorization Methods , 2000, PRICAI Workshop on Text and Web Mining.

[5]  Yiyu Yao,et al.  Granular Computing as a Basis for Consistent Classification Problems , 2002 .

[6]  Yiyu Yao,et al.  On modeling data mining with granular computing , 2001, 25th Annual International Computer Software and Applications Conference. COMPSAC 2001.

[7]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[8]  Yiyu Yao,et al.  Induction of Classification Rules by Granular Computing , 2002, Rough Sets and Current Trends in Computing.