An improved naive Bayesian classifier technique coupled with a novel input solution method [rainfall prediction]

Data mining is the study of how to determine underlying patterns in the data to help make optimal decisions on computers when the database involved is voluminous, hard to characterize accurately and constantly changing. It deploys techniques based on machine learning alongside more conventional methods. These techniques can generate decision or prediction models based on actual historical data. Therefore, they represent true evidence-based decision support. Rainfall prediction is a good problem to solve by data mining techniques. This paper proposes an improved naive Bayes classifier (INCB) technique and explores the use of genetic algorithms (GAs) for the selection of a subset of input features in classification problems. It then carries out a comparison with several other techniques. It compares the following algorithms on real meteorological data in Hong Kong: (1) genetic algorithms with average classification or general classification (GA-AC and GA-C), (2) C4.5 with pruning, and (3) INBC with relative frequency or initial probability density (INBC-RF and INBC-IPD). Two simple schemes are proposed to construct a suitable data set for improving their performance. Scheme I uses all the basic input parameters for rainfall prediction. Scheme II uses the optimal subset of input variables which are selected by a GA. The results show that, among the methods we compared, INBC achieved about a 90% accuracy rate on the rain/no-rain classification problems. This method also attained reasonable performance on rainfall prediction with three-level depth and five-level depth, which are around 65%-70%.

[1]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Rynson W. H. Lau,et al.  Fuzzy genetic algorithm approach to feature selection problem , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[3]  Andrew K. C. Wong,et al.  Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Andrew K. C. Wong,et al.  Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[6]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[7]  Jun S. Huang,et al.  A transformation invariant matching algorithm for handwritten chinese character recognition , 1990, Pattern Recognit..

[8]  James Nga-Kwok Liu,et al.  An Automatic Satellite Interpretation of Tropical Cyclone Patterns Using Elastic Graph Dynamic Link Model , 1999, Int. J. Pattern Recognit. Artif. Intell..

[9]  Y. Tanaka An overview of fuzzy logic , 1993, Proceedings of WESCON '93.

[10]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[11]  Wray L. Buntine Learning Classification Rules Using Bayes , 1989, ML.

[12]  James Nga-Kwok Liu,et al.  Invariant Handwritten Chinese Character Recognition , 1998, ICONIP.

[13]  R. S. T. Lee,et al.  Teaching and Learning the AI Modeling , 2000 .

[14]  Sankar K. Pal,et al.  Genetic Algorithms for Pattern Recognition , 2017 .

[15]  I. B. Turksen,et al.  Fuzzy logic: review of recent concerns , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[16]  James Nga-Kwok Liu,et al.  Forecasting from Low Quality Data with Applications in Weather Forecasting , 1998, Informatica.