Discovering and Describing Category Differences: What makes a discovered difference insightful?

Discovering and Describing Category Differences: What makes a discovered difference insightful? Stephen D. Bay (sbay@ics.uci.edu) Michael J. Pazzani (pazzani@ics.uci.edu) Department of Information and Computer Science University of California, Irvine Irvine, CA 92697, USA Abstract Many organizations have turned to computer analysis of their data to deal with the explosion of available electronic data. The goal of this analysis is to gain insight and new knowledge about their core activities. A common query is comparing several dif- ferent categories (e.g., customers who default on loans versus those that don’t) to discover previously unknown differences between them. Current mining algorithms can produce rules which differentiate the groups with high accuracy, but often human domain experts find these results neither insightful nor useful. In this paper, we take a step toward understanding how humans interpret discovered rules by presenting a case study: we compare the responses of admissions officers (domain ex- perts) on the output of two data mining algorithms which at- tempt to find out why admitted students choose to enroll or not enroll at UC Irvine. We analyze the responses and iden- tify several factors that affect what makes the discovered rules insightful. Introduction Data collection is a daily activity of many organizations in business, science, education, and medicine. Large databases are routinely collected and with the advent of computers to process the information, these organizations want to analyze the data to gain insight and knowledge about the underlying process behind the data. The data usually represents infor- mation on their core business, and an important task is un- derstanding the differences between various client groups. For example, bank loan officers may be interested in analyz- ing historical loan data to understand the differences between people who are good and poor credit risks. Admissions of- ficers at UC Irvine (UCI) are interested in analyzing admis- sions data to understand the factors which influence an admit- ted student’s choice to enroll at UCI. It is important that the discovered differences both be true and accurate descriptions of the data as well as being acceptable and understandable by the end users. A common technique for discovering group differences from data is to apply a data mining algorithm to automati- cally find rules from the data. For example, after analyzing loan data we might find that people with graduate degrees are good loan risks (i.e. grad-degree ! low-risk). There have been many studies which investigate the accuracy of rules that describe category differences, but very few which investigate how humans interpret the results. In this paper, we focus on two issues relating to the inter- pretation of discovered rules by human domain experts: First, algorithms for automatically finding group differences can be categorized broadly into discriminative and characteristic (or informative) approaches (Rubinstein & Hastie, 1997). In dis- criminative approaches, the algorithms attempt to find differ- ences that can be directly used to classify the instances of the groups. In characteristic approaches, the algorithms at- tempt to find differences in the class descriptions, some of which may also be highly predictive but are not necessarily so. We investigate if human domain experts have a prefer- ence for either strategy. Second, there are many objective measures of rule quality and typically mining algorithms seek rules that optimize these measures. For example, with if- then rules of the form A ! C (antecedent implies conse- quent), many algorithms attempt to maximize the confidence which is the conditional probability of the consequent be- ing true given the antecedent ( P C j A ). The assumption is that rules that score highly on the objective measure are use- ful to domain experts. The problem is that while there are many objectives measures of pattern quality, such as support (Agrawal, Imielinski, & Swami, 1993), confidence (Agrawal et al., 1993), lift (also known as interest) (Brin, Motwani, Ull- man, & Tsur, 1997), conviction (Brin et al., 1997) and many others, none of the measures truly correlate with what human domain experts find interesting, useful, or acceptable. The reality is that most mined results are not useful at all. For example, Major and Mangano (1995) analyzed rules from a hurricane database and reduced 161 rules to 10 “genuinely interesting” rules. In a more extreme, but common case, Brin et al. found over 20000 rules on a census database from which they learned that “five year olds don’t work, unemployed resi- dents don’t earn income from work, men don’t give birth” and other uninteresting facts. Thus we investigate the relationship between human subjective measures of rule usefulness to ob- jective measures of rule quality. We answer our research questions, “Is a discriminative or characteristic approach more useful for describing group dif- ferences?” and “How do subjective and objective measures of rule interest relate to each other?” by reporting on an analysis of discovered rules by human domain experts. We analyzed UCI admissions data to understand the groups of students that decide to enroll or not enroll at UCI given an offer of admission. After discovering rules with two differ- ent algorithms, we then showed the rules to human domain experts and asked them to rate the rules according their in- sightfulness, i.e. did the rule expand their knowledge about the admission process? After obtaining experts results, we then analyzed the responses to compare and contrast discrim- inative and characteristic approaches as well as objective and subjective measures of rule quality.