Huge Data Mining Based on Rough Set Theory and Granular Computing

Data mining is a hot research field which has been studied by a lot of scientists and technicians for many years. Unfortunately, it is still a very difficult problem to mine huge data sets efficiently. Many researchers are working on developing fast data mining technologies and methods for processing huge data sets efficiently. The basic idea of quick sort is the divide and conquer method. It represents the idea of granular computing (GrC). The average time complexity of quick sort for an m dimensions table containing n records were usually considered to be mXnXlogn since the average time complexity of quick sort for a one detention array with n records is nXlogn. However, we find that it is just nX(m+logn), while not mXnXlogn. Based on this finding, there is an assumption that divide and conquer method can be used to improve the existed knowledge reduction algorithms in rough set theory and granular computing. It may be a good way to solve the problem of huge data mining. In this paper, we present our research plan about huge data mining based on rough set theory and granular computing. Besides, we also present our recent achievements.

[1]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[2]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[3]  Hu Feng Analysis of the Complexity of Quick Sort for Two Dimension Table , 2007 .

[4]  Vipin Kumar,et al.  ScalParC: a new scalable and efficient parallel classification algorithm for mining large datasets , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[5]  Wang Ju,et al.  Reduction algorithms based on discernibility matrix: The ordered attributes method , 2001, Journal of Computer Science and Technology.

[6]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[7]  Salvatore J. Stolfo,et al.  An extensible meta-learning approach for scalable and accurate inductive learning , 1996 .

[8]  Guoyin Wang,et al.  Attribute Core Computation Based on Divide and Conquer Method , 2007, RSEISP.

[9]  Lotfi A. Zadeh,et al.  Fuzzy sets and information granularity , 1996 .

[10]  XIAOHUA Hu,et al.  LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH , 1995, Comput. Intell..

[11]  Wang Guo Quick Reduction Algorithm Based on Attribute Order , 2007 .

[12]  Wang Guo,et al.  Decision Table Reduction based on Conditional Information Entropy , 2002 .

[13]  T. Y. Lin,et al.  Granular Computing on Binary Relations II Rough Set Representations and Belief Functions , 1998 .

[14]  Y. Yao Granular Computing : basic issues and possible solutions , 2000 .

[15]  Sanjay Ranka,et al.  CLOUDS: A Decision Tree Classifier for Large Datasets , 1998, KDD.

[16]  He Qing HSC Classification Method and Its Applications in Massive Data Classifying , 2002 .

[17]  Hu Keyun Advances in rough set theory and its appliations , 2001 .

[18]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[19]  Zhang Bo,et al.  Theory of Fuzzy Quotient Space (Methods of Fuzzy Granular Computing) , 2003 .

[20]  Lotfi A. Zadeh,et al.  Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic , 1997, Fuzzy Sets Syst..

[21]  Liu Shao A New Method for Fast Computing Positive Region , 2003 .