Applications of Fuzzy and Rough Set Theory in Data Mining

The explosion of very large databases has created extraordinary opportunities for monitoring, analyzing and predicting global economical, geographical, demographic, medical, political, and other processes in the world. Statistical analysis and data mining techniques have emerged for these purposes. Data mining is the process of discovering previously unknown but potentially useful patterns, rules, or associations from huge quantity of data. Data mining can be performed on different data repositories such as relational databases, data warehouses, transactional databases, sequence databases, spatial databases, spatio-temporal databases, and text databases, etc. Typically, data mining functionalities can be classified into two categories: descriptive and predictive. Descriptive mining tasks aim at characterizing the general properties of the data in the databases, while predictive mining tasks perform inherence on the current data in order to make prediction in future.

[1]  Jitender S. Deogun,et al.  Dealing with Missing Data: Algorithms Based on Fuzzy Set and Rough Set Theories , 2005, Trans. Rough Sets.

[2]  J. Deogun,et al.  Gene Function Classification Using Fuzzy K-Nearest Neighbor Approach , 2007 .

[3]  Jerzy W. Grzymala-Busse,et al.  Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction , 2004, Trans. Rough Sets.

[4]  Suvrit Sra,et al.  Minimum Sum-Squared Residue based clustering of Gene Expression Data , 2004 .

[5]  Ying Sai,et al.  Mining Stock Market Tendency by RS-Based Support Vector Machines , 2007 .

[6]  Inderjit S. Dhillon,et al.  Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.

[7]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[8]  Roland Eils,et al.  Applying Support Vector Machines for Gene ontology based gene function prediction , 2004, BMC Bioinformatics.

[9]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[10]  Daryl Pregibon,et al.  A statistical perspective on KDD , 1995, KDD 1995.

[11]  Anita K. Jones,et al.  Computer System Intrusion Detection: A Survey , 2000 .

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Man Hon Wong,et al.  Mining fuzzy association rules in databases , 1998, SGMD.

[14]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[15]  Jerzy W. Grzymala-Busse,et al.  Rough Set Strategies to Data with Missing Attribute Values , 2006, Foundations and Novel Approaches in Data Mining.

[16]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[17]  Daryl Pregibon,et al.  A Statistical Perspective on Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[18]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[19]  David S. Wishart Number of Clusters , 2005 .

[20]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[21]  Pradeep Kumar,et al.  Rough clustering of sequential data , 2007, Data Knowl. Eng..

[22]  Huanglin Zeng,et al.  Redundant Data Processing Based on Rough-Fuzzy Approach , 2006, RSKT.

[23]  Ronald R. Yager,et al.  Using fuzzy methods to model nearest neighbor rules , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[24]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[25]  Wojciech Ziarko,et al.  The Discovery, Analysis, and Representation of Data Dependencies in Databases , 1991, Knowledge Discovery in Databases.

[26]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[27]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[28]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[29]  Jitender S. Deogun,et al.  FADS: A Fuzzy Anomaly Detection System , 2006, RSKT.

[30]  Sushil Jajodia,et al.  ADAM: a testbed for exploring the use of data mining in intrusion detection , 2001, SGMD.

[31]  Daniel Vanderpooten,et al.  A Generalized Definition of Rough Approximations Based on Similarity , 2000, IEEE Trans. Knowl. Data Eng..

[32]  Hisao Ishibuchi,et al.  Fuzzy data mining: effect of fuzzy discretization , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[33]  Frank Klawonn,et al.  Fuzzy Clustering Based on Modified Distance Measures , 1999, IDA.

[34]  P. Roth MISSING DATA: A CONCEPTUAL REVIEW FOR APPLIED PSYCHOLOGISTS , 1994 .

[35]  Jaideep Srivastava,et al.  Data Mining for Network Intrusion Detection , 2002 .

[36]  Vijay V. Raghavan,et al.  Data Mining: Trends in Research and Development , 1997 .

[37]  Nir Friedman,et al.  Building Classifiers Using Bayesian Networks , 1996, AAAI/IAAI, Vol. 2.

[38]  Susan M. Bridges,et al.  Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection , 2000 .

[39]  P Bork,et al.  Homology-based gene prediction using neural nets. , 1998, Analytical biochemistry.

[40]  Andrew K. C. Wong,et al.  Statistical Technique for Extracting Classificatory Knowledge from Databases , 1991, Knowledge Discovery in Databases.

[41]  Philip K. Chan,et al.  Systems for Knowledge Discovery in Databases , 1993, IEEE Trans. Knowl. Data Eng..

[42]  David J. Hand,et al.  Advances in intelligent data analysis , 2000 .

[43]  Sankar K. Pal,et al.  Rough fuzzy MLP: knowledge encoding and classification , 1998, IEEE Trans. Neural Networks.

[44]  Harris Drucker,et al.  Capacity and Complexity Control in Predicting the Spread Between Borrowing and Lending Interest Rates , 1995, KDD.

[45]  Usama M. Fayyad,et al.  Mining Databases: Towards Algorithms for Knowledge Discovery , 1998, IEEE Data Eng. Bull..

[46]  Ingunn Myrtveit,et al.  Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods , 2001, IEEE Trans. Software Eng..

[47]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[48]  Rajkumar Bondugula,et al.  Profiles and fuzzy K-nearest neighbor algorithm for protein secondary structure prediction , 2005, APBC.

[49]  Sankar K. Pal,et al.  Data mining in soft computing framework: a survey , 2002, IEEE Trans. Neural Networks.

[50]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[51]  Hong Wang,et al.  Rough Set Attribute Reduction in Decision Systems , 2006, RSKT.

[52]  Eyke Hüllermeier,et al.  Mining implication-based fuzzy association rules in databases , 2003 .

[53]  Jagath C. Rajapakse,et al.  Augmenting HMM with neural network for finding gene structure , 2002, 7th International Conference on Control, Automation, Robotics and Vision, 2002. ICARCV 2002..

[54]  Sholom M. Weiss,et al.  Decision-Rule Solutions for Data Mining with Missing Values , 2000, IBERAMIA-SBIA.

[55]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[56]  Seung-Yeon Kim,et al.  Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method , 2005, Bioinform..

[57]  Ergun Akleman,et al.  Generalized distance functions , 1999, Proceedings Shape Modeling International '99. International Conference on Shape Modeling and Applications.

[58]  Tu Bao Ho,et al.  Cluster-Based Algorithms for Dealing with Missing Values , 2002, PAKDD.

[59]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[60]  Jitender S. Deogun,et al.  Discovering representative episodal association rules from event sequences using frequent closed episode sets and event constraints , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[61]  Rui Yan,et al.  Comparison of Conventional and Rough K-Means Clustering , 2003, RSFDGrC.

[62]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[63]  Anne M. Denton,et al.  P-tree classification of yeast gene deletion data , 2002, SKDD.

[64]  Babak Shahbaba,et al.  Gene function classification using Bayesian models with hierarchy-based priors , 2006, BMC Bioinformatics.

[65]  M. Narasimha Murty,et al.  An adaptive rough fuzzy single pass algorithm for clustering large data sets , 2003, Pattern Recognit..