An Efficient Data Reduction Method to Mine Herbalist Medical Diagnostic Rules from Large Case Repository with High Dimensional Data

An effective data reduction method is proposed to mine herbalist medical diagnostic rules from large case repository with high dimensional data. The method compresses the data effectively without information loss by using two concepts Feature Dissimilarity of a Set and Patient Feature Vector, thus reduces the data scale enormously. Furthermore it groups the patients described by large number of symptoms and demographic features into several clusters with much lower dimensionality, the irrelevant attributes are removed from each cluster. Then it gets the final disease classification rules by training each cluster with much less attributes by neural network. Because of the effective data compression and dimensions deduction, the method is effective and efficient.