A study on company name matching for database integration

In this report we describe an activity of information integration performed on databases with patent data and company indicators. Depending on the application area, this kind of activity is known as record linkage, duplicate detection, record matching, reference reconciliation or other domain-specific terms. In particular, we present a detailed case study on company name matching. We show how to choose and tune existing methods to work on the domain object of this paper, and describe an efficient implementation to process large volumes of data. The integration activity involves the application of approximate string matching techniques. Then, we show the experimental results obtained on real data sets, highlighting the pros and cons of approximate string matching in this specific domain, and analyze the impact of domain knowledge on the results of the matching activity. . Department of Computer Science, University of Bologna, Via Mura A.Zamboni 7, 40127 Bologna, Italy.