Entity Identity Resolution

This chapter examines the root cause of the “dual challenge” of identity resolution, and describes how parsing and standardization contribute to the process It also reviews different ways that similarity scoring and approximate matching algorithms can help determine and resolve identical entities despite variant representations. Issues regarding errors in record linkage are examined, and what data values survive the resolution process is determined. Identity resolution employs techniques for measuring the degree of similarity between any two records, often based on weighted approximate matching between a set of attribute values in the two records. The selection of an identity resolution tool must be accompanied by a process to analyze the suitability of entity data elements as candidate-identifying attributes. This assessment must consider a number of factors, especially when observing how well that attribute selection helps meet the dual challenge associated with unique identification, entity differentiation, and record matching. By applying approximate matching techniques to sets of those identifying attributes, identity resolution can be used to recognize when slight variations suggest that different records are connected, where values may be cleansed, or where enough differences between the data suggest that the two records truly represent distinct entities. Identity resolution is a critical component of data quality, master data management, and business intelligence applications