This paper presents a novel probability neural network (PNN) that can classify the data for both continuous and categorical input data types. A mixture model of continuous and categorical variables is proposed to construct a probability density function (PDF) that is the key part for the PNN. The proposed PNN has two advantages compared to conventional algorithms such as the multilayer perceptron (MLP) neural network. One is that the PNN can produce better results compared to the MLP neural network when the input data set includes both continuous and categorical data types, even using the normalised input variables. Normally, the normalised input variables generate a better result than the non-normalised input variables for the MLP neural network. The second advantage is that the PNN does not need the cross-validation data set and does not produce over-training like the MLP neural network. These advantages have been proven in our experimental study. The proposed PNN can also be used to perform unsupervised cluster analysis. The superiority of the PNN, compared to the MLP neural network, Radical Basis Function (RBF) neural network, C4.5 and Random Forest decisions trees, are demonstrated by applying them to two real-life data sets, the Heart Disease and Trauma data sets, which include both continuous and categorical variables.
[1]
Geoffrey J. McLachlan,et al.
Using the EM algorithm to train neural networks: misconceptions and a new algorithm for multiclass classification
,
2004,
IEEE Transactions on Neural Networks.
[2]
Aristidis Likas,et al.
Shared kernel models for class conditional density estimation
,
2001,
IEEE Trans. Neural Networks.
[3]
Alan L. Yuille,et al.
Statistical Physics, Mixtures of Distributions, and the EM Algorithm
,
1994,
Neural Computation.
[4]
R. Detrano,et al.
International application of a new probability algorithm for the diagnosis of coronary artery disease.
,
1989,
The American journal of cardiology.
[5]
Christopher M. Bishop,et al.
Neural networks for pattern recognition
,
1995
.