An application of least squares fit mapping to clinical classification.

This paper describes a unique approach, "Least Square Fit Mapping," to clinical data classification. We use large collections of human-assigned text-to-category matches as training sets to compute the correlations between physicians' terms and canonical concepts. A Linear Least Squares Fit (LLSF) technique is employed to obtain a mapping function which optimally fits the known matches given in a training set and probabilistically captures the unknown matches for arbitrary texts. We tested our method with 16,032 texts from the Mayo Clinic, and judged the results using human-assigned answers. In a test for comparison, the LLSF mapping achieved a precision rate of 89% at 100% recall, outperforming alternative approaches including string matching (36% precision), string matching enhanced by morphological parsing (51% precision), and statistical weighting (61% precision).