Discovering Rules That Govern Monotone Phenomena

Unlocking the mystery of natural phenomena is a universal objective in scientific research. The rules governing a phenomenon can most often be learned by observing it under a sufficiently large number of conditions that are sufficiently high in resolution. The general knowledge discovery process is not always easy or efficient, and even if knowledge is produced it may be hard to understand, interpret, validate, remember, and use. Monotonicity is a pervasive property in nature: it applies when each predictor variable has a nonnegative effect on the phenomenon under study. Due to the monotonicity property, being able to observe the phenomenon under specifically selected conditions may increase the accuracy and completeness of the knowledge at a faster rate than a passive observer who may not receive the pieces relevant to the puzzle soon enough. This scenario can be thought of as learning by successively submitting queries to an oracle which responds with a Boolean value (phenomenon is present or absent). In practice, the oracle may take the shape of a human expert, or it may be the outcome of performing tasks such as running experiments or searching large databases. Our main goal is to pinpoint the queries that minimize the total number of queries used to completely reconstruct all of the underlying rules defined on a given finite set of observable conditions V = {0,1}n. We summarize the optimal query selections in the simple form of selection criteria, which are near optimal and only take polynomial time (in the number of conditions) to compute. Extensive unbiased empirical results show that the proposed selection criterion approach is far superior to any of the existing methods. In fact, the average number of queries is reduced exponentially in the number of variables n and more than exponentially in the oracle’s error rate.

[1]  Randolph Church,et al.  Nunmerical analysis of certain free distributive structures , 1940 .

[2]  Endre Boros,et al.  Boolean regression , 1995, Ann. Oper. Res..

[3]  J. Picard Maximal Closure of a Graph and Applications to Combinatorial Problems , 1976 .

[4]  N. A. Sokolov On the optimal evaluation of monotonic Boolean functions , 1982 .

[5]  Leonid Khachiyan,et al.  On the Complexity of Dualization of Monotone Disjunctive Normal Forms , 1996, J. Algorithms.

[6]  A. V. Karzanov,et al.  Determining the maximal flow in a network by the method of preflows , 1974 .

[7]  Arie Ben-David,et al.  Automatic Generation of Symbolic Multiattribute Ordinal Knowledge‐Based DSSs: Methodology and Applications* , 1992 .

[8]  Evangelos Triantaphyllou,et al.  A heuristic for mining association rules in polynomial time , 2003 .

[9]  Georg Gottlob,et al.  Identifying the Minimal Transversals of a Hypergraph and Related Problems , 1995, SIAM J. Comput..

[10]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[11]  Toshihide Ibaraki,et al.  Polynomial-Time Recognition of 2-Monotonic Positive Boolean Functions Given by an Oracle , 1997, SIAM J. Comput..

[12]  Jianhua Chen,et al.  An incremental learning algorithm for constructing Boolean functions from positive and negative examples , 2002, Comput. Oper. Res..

[13]  D N Gainanov On one criterion of the optimality of an algorithm for evaluating monotonic Boolean functions , 1985 .

[14]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[15]  Neil R. Smalheiser,et al.  A probabilistic similarity metric for Medline records: A model for author name disambiguation , 2005, J. Assoc. Inf. Sci. Technol..

[16]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[17]  Thomas Quint,et al.  On the inference of semi-coherent structures from data , 2005, Comput. Oper. Res..

[18]  V. Chandru,et al.  Optimization Methods for Logical Inference , 1999 .

[19]  H. D. Brunk,et al.  AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION , 1955 .

[20]  Toshihide Ibaraki,et al.  A Fast and Simple Algorithm for Identifying 2-Monotonic Positive Boolean Functions , 1995, J. Algorithms.

[21]  Klaus Truemper,et al.  A MINSAT Approach for Learning in Logic Domains , 2002, INFORMS J. Comput..

[22]  Toshihide Ibaraki,et al.  Complexity of Identification and Dualization of Positive Boolean Functions , 1995, Inf. Comput..

[23]  Douglas H. Wiedemann,et al.  A computation of the eighth Dedekind number , 1991 .

[24]  Evangelos Triantaphyllou,et al.  On the minimum number of logical clauses inferred from examples , 1996, Comput. Oper. Res..

[25]  Allan R. Sampson,et al.  Structure Algorithms for Partially Ordered Isotonic Regression , 1994 .

[26]  David A. Cohn,et al.  Minimizing Statistical Bias with Queries , 1996, NIPS.

[27]  Evangelos Triantaphyllou,et al.  Minimizing the Average Query Complexity of Learning Monotone Boolean Functions , 2002, INFORMS J. Comput..

[28]  Vijay Chandru,et al.  Optimization Methods for Logical Inference: Chandru/Optimization , 1999 .

[29]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[30]  Bernard W. Silverman,et al.  Monotone discriminant functions and their applications in rheumatology , 1997 .

[31]  Chu-in Charles Lee,et al.  The Min-Max Algorithm and Isotonic Regression , 1983 .

[32]  Endre Boros,et al.  Predicting Cause-Effect Relationships from Incomplete Discrete Observations , 1994, SIAM J. Discret. Math..

[33]  Evangelos Triantaphyllou,et al.  Interactive Learning of Monotone Boolean Functions , 1996, Inf. Sci..

[34]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[35]  E. Triantaphyllou Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples , 1998 .

[36]  Thomas S. Ferguson,et al.  Sequential classification on partially ordered sets , 2003 .

[37]  Evangelos Triantaphyllou,et al.  The Reliability Issue of Computer-Aided Breast Cancer Diagnosis , 2000, Comput. Biomed. Res..

[38]  John N. Hooker,et al.  Logic-Based Methods for Optimization , 1994, PPCP.

[39]  Arie Ben-David,et al.  Monotonicity maintenance in information-theoretic machine learning algorithms , 2004, Machine Learning.

[40]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[41]  Toshihide Ibaraki,et al.  Data Analysis by Positive Decision Trees , 1999, CODAS.

[42]  Toshihide Ibaraki,et al.  The Maximum Latency and Identification of Positive Boolean Functions , 1994, ISAAC.

[43]  W. J. Thron,et al.  Encyclopedia of Mathematics and its Applications. , 1982 .

[44]  A. Soyster,et al.  An approach to guided learning of boolean functions , 1996 .

[45]  Evangelos Triantaphyllou,et al.  Guided inference of nested monotone Boolean functions , 2003, Inf. Sci..

[46]  R. Dedekind,et al.  Über Zerlegungen von Zahlen Durch Ihre Grössten Gemeinsamen Theiler , 1897 .

[47]  Evangelos Triantaphyllou Inference of a minimum size boolean function from examples by using a new efficient branch-and-bound approach , 1994, J. Glob. Optim..

[48]  Ilya Shmulevich Properties and applications of monotone Boolean functions and stack filters , 1997 .