Approximately duplicate record detection method based on neural network and genetic algorithm

In order to solve the problem of approximately duplicate record detection in the field of data cleaning effectively,a method based on neural network and genetic algorithm is proposed.Firstly,this method measures the similarity of each corresponding field pairs in the two records.Then a model based on neural network for detection is constructed,and genetic algorithm is adopted to optimize the weights of the neural network model.Finally,the neural network trained on some samples is used to classify the record pair in duplicate or non-duplicate one.Experimental results on a range of datasets show that this method improves the accuracy and precision of duplicate detection over traditional methods.