On learning in the presence of unspecified attribute values

We continue the study of learning in the presence of unspecified attribute values (UAV) where some of the attributes of the examples may be unspecified [lo, 51. A UAV assignment z E (0, l,*}n, where * indicates unspecified, is classified positive (negative) with respect to a Boolean function f if all possible assignments for the unspecified attributes result in a positive (negative) classification. Otherwise, the classification of x is *. Given an example z E { 0, 1, *}n, the oracle UAV-MQ(z) responds with the classification of x with respect to the unknown target. Given a hypothesis h, the oracle UAV-EQ returns an example x E { 0, 1, A}” for which h(z) is incorrect, if such an example exists. The new contributions of this paper are as follows. First we define a new oracle called the relevant variable oracle, or RV, which takes as input a subcube of (0, 1)” and returns a relevant variable of the target on this subcube, if one exists. We then show that a class is query learnable using UAVMQs if and only if it is query learnable using MQs and an RV Next we give a lower bound for the number of UAV-MQs required to learn k-term DNF. After this we investigate the learnability of CDNF with UAV-MQs. The two main results of this particular investigation are 1) if class C is learnable as CDNF using MQs and EQs then it is learnable using UAV-MQs, and 2) CDNF is query learnable using only UAV-MQs (the algorithm is not time efficient). We then give efficient learning algorithms using UAV-MQs for the class of rank& decision trees and the class of Boolean functions of a constant number of terms or clauses. The former of the two previous results leads to a quasipolynomial time UAV-MQ algorithm for decision trees with polynomial size and CDNF with polynomial size. Finally, we answer an open problem Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. COLT ‘99 7/99 Santa Cruz, CA, USA Q 1999 ACM I-581 1%167~4/99/0006...$5.00 David K. Wilson Department of Computer Science University of Calgary 2500 University Dr. N. W. Calgary, AB, Canada T2N lN4 wilsond@cpsc.ucalgary.ca posed in both [ 101 and [5] by showing that decision trees are learnable using UAV-MQs and UAV-EQs.