Performance bounds on the splitting algorithm for binary testing

SummaryIn machine fault-location, medical diagnosis, species identification, and computer decisionmaking, one is often required to identify some unknown object or condition, belonging to a known set of M possibilities, by applying a sequence of binary-valued tests, which are selected from a given set of available tests. One would usually prefer such a testing procedure which minimizes or nearly minimizes the expected testing cost for identification. Existing methods for determining a minimal expected cost testing procedure, however, require a number of operations which increases exponentially with M and become infeasible for solving problems of even moderate size. Thus, in practice, one instead uses fast, heuristic methods which hopefully obtain low cost testing procedures, but which do not guarantee a minimal cost solution. Examining the important case in which all M possibilities are equally likely, we derive a number of cost-bounding results for the most common heuristic procedure, which always applies next that test yielding maximum information gain per unit cost. In particular, we show that solutions obtained using this method can have expected cost greater than an arbitrary multiple of the optimal expected cost.