A Successful Technique forRemoving NamesinPathology Reports UsinganAugmented Search andReplace Method

Theability toaccess large amounts ofde-identified clinical datawouldfacilitate epidemiologic and retrospective research. Previously described deidentification methods require knowledge ofnatural language processing or havenotbeenmade available tothepublic. Wetake advantage ofthefact that thevast majority ofproper namesinpathology reports occurinpairs. Inrarecaseswhereone proper nameisbyitself, itispreceded orfollowed by anaffiLx that identifies itasaproper name(Mrs., Dr., PhD).Wecreated atool based onthis observation usingsubstitution methods thatwas easyto implement andwaslargely basedonpublicly available datasources. Wecompiled aClinical and CommonUsageWord(CCUW)list aswell asafairly comprehensive proper namelist. Despite thelarge overlap between these twolists, wewereableto refine ourmethods toachieve accuracy similar to previous attempts atde-identifcation. Ourmethod found98.7%of231proper namesinthenarrative sections ofpathology reports. Three single proper namesweremissed outof1001pathology reports (0.3%, nofirst name/last namepairs). Itisunlikely thatidentification couldbe implied fromthis information. Wewill continue torefine ourmethods, specifically working toimprove thequality ofour CCUWandproper namelists toobtain higher levels ofaccuracy.