Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project

ion planning It may be possible to take a coarser view of the problem space, as though we could back off and see the general features while omitting the details. This is one form of planning (a different form than embodied in Heuristic DENDRAL, to be described shortly). We will call it abstraction to keep the terminology clear. In the terms of our paradigm of problem solving, abstraction can be achieved by defining a new set of problem-state descriptions, based on a language that is an abstrac34 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR ORGANIC CHEMISTRY tion of the basic language in the sense that one problem-state description in the abstraction language corresponds to many problem-state descriptions in the basic language. The new view of the problem space now offers a diminished set of states and therefore reduces the complexity of the problem. Transformations for the abstraction language defme the connectivity of the abstracted space. If the problem solver can discover a path from initial state to goal state in the abstracted space, he has not solved the original problem but has established a plan. Each step of the plan then becomes a problem in the original space, but the combined complexity of all these problems may be less than the complexity of the original problem. Working backward For some problems the number of alternatives to be searched is fewer if we begin at the goal state and, running the transformations in reverse as it were, search for the initial state. This situation might be the case if there is only one goal state but a multitude of initial states. Working backward is a time-honored procedure in mathematics and logic. Of course it is of no use for problems in which the transformations are not reversible, or when the definition of goal state is in terms to which the transformations cannot apply, as with chess. One would be hard put to play chess by working backward from a definition of checkmate, looking for the initial board configuration. 3.2.2.2 Heuristic generation Generate and test If there is a procedure that can generate candidate solution, goal states in our present terminology, it may be possible to solve problems by the sequential enumeration and checking of potential solutions. This method is frequently called generate and test. Note that in this paradigm it is not the states of the problem space that are generated. Indeed there need not be a problem space of the sort we have been discussing. Here we need only a space of solutions that can be generated for consideration. This paradigm is thus fundamentally different from searching the space of problem states (state-space search). Analogs of the heuristic search methods described above apply to heuristic generation. Statistical information concerning the distribution of solutions (perhaps by categories defined in a “secondary solution description language”) may be used to guide generation, If some measure of goodness is computable from a proposed solution, then a hill-climbing procedure could be used to determine ways of modifying one proposal in appropriate ways. Similarly, if an abstracted description of, the set of solutions can be constructed, it may be possible to search first for the’correct solution class, and then search within that class. These methods do not exhaust the possibilities, but they provide enough structure for our discussion. 3.2.3 Multiple Sources of Knowledge Much more problem-solving power can be achieved if there is more than one source of information that can be used. For this reason purely syntactic problem solvers are inherently less powerful than those that employ semantic information as well. The secret ARTIFICIAL INTELLIGENCE 35 is not that semantic information is more important, but that two sources of guidance are better than one. Jigsaw puzzles are an appropriate image to make this procedure clear. The problem of finding the one arrangement of all the pieces that yields the desired picture may be a very difficult and large combinatorial task. If the puzzle were done with the pieces face down, analogous to having only the “syntactic” information of piece contours, the true difficulty of the problem would become apparent. If all the pieces were of the same shape, squares or hexagons, the problem could only be solved from the “semantic” information of colors and pictured objects and would also be very difficult. Jigsaw puzzles are tractable because both these sources of information are available and can be played off against one another. In terms of the paradigm of state-space search, different sources of information correspond to different problem-state languages. Several of the search heuristics defmed above rely on these secondary languages. Hill climbing needs a language in which to define its gradient of “warmth,” and statistically guided search is based on a rather general but blurry-visioned search for descriptors that have some correlational information. Abstraction planning is another way of bringing to bear different views of the problem space. On a more general level, apart from any of these methods, is the representation problem. A problem has in general many possible representations. It may be possible to choose two or more (rather than just one) in such a way that progress in one statespace search can be transferred to another. Thus those transitions that are difficult in one representation may be bypassed by using a second and vice versa. In the context of heuristic generation, multiple sources of knowledge have the effect of limiting generation to the intersection of the solution sets delimited by each source.

[1]  Carl Djerassi,et al.  Applications of Artificial Intelligence for Chemical Inference. Part XIX. Computer Generation of Ion Structures , 1976 .

[2]  Tom Michael Mitchell,et al.  Application of artificial intelligence for chemical inference. XXV. A computer program for automated empirical 13C NMR rule formation , 1978 .

[3]  Bruce G. Buchanan,et al.  Dendral and Meta-Dendral: Their Applications Dimension , 1978, Artif. Intell..

[4]  Joshua Lederberg DENDRAL-64: A System for Computer Construction, Enumeration and Notation of Organic Molecules as Tree Structures and Cyclic Graphs. Part III. Complete Chemical Graphs; Embedding Rings in Trees , 1970 .

[5]  R. Jeffrey Davis,et al.  Applications of meta level knowledge to the construction, maintainance and use of large knowledge b , 1976 .

[6]  J. Lederberg DENDRAL-64 - A system for computer construction, enumeration and notation of organic molecules as tree structures and cyclic graphs. Part II - Topology of cyclic graphs Interim report , 1965 .

[7]  Lee D. Erman,et al.  A model and a system for machine recognition of speech , 1973 .

[8]  Charles E. Mortimer,et al.  Chemistry: A conceptual approach , 1975 .

[9]  D. H. Smith,et al.  Applications of artificial intelligence for chemical inference. IX. Analysis of mixtures without prior separation as illustrated for estrogens. , 1973, Journal of the American Chemical Society.

[10]  Fred W. McLafferty,et al.  Interpretation of mass spectra : an introduction , 1966 .

[11]  P. Medawar Induction and intuition in scientific thought , 1969 .

[12]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.

[13]  Joshua Lederberg,et al.  Use of A Computer to Identify Unknown Compounds: The Automation of Scientific Inference , 1972 .

[14]  James G. Nourse,et al.  Applications of artificial intelligence for chemical inference. 28. The configuration symmetry group and its application to stereoisomer generation, specification, and enumeration , 1979 .

[15]  Joshua Lederberg DENDRAL-64: A System for Computer Construction, Enumeration and Notation of Organic Molecules as Tree Structures and Cyclic Graphs. Part I. Notational Algorithm for Tree Structures , 1964 .

[16]  J. A. Robinson,et al.  A Machine-Oriented Logic Based on the Resolution Principle , 1965, JACM.

[17]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[18]  Bertram Raphael,et al.  Robot Research at Stanford Research Institute , 1972 .

[19]  George W. A. Milne,et al.  Mass spectrometry: techniques and applications , 1971 .

[20]  Larry Laudan Peirce and the Trivialization of the Self-Corrective Thesis , 1981 .

[21]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[22]  J. Lederberg,et al.  TOPOLOGICAL MAPPING OF ORGANIC MOLECULES. , 1965, Proceedings of the National Academy of Sciences of the United States of America.

[23]  J. Lederberg Hamilton Circuits of Convex Trivalent Polyhedra (Up to 18 Vertices) , 1967 .

[24]  Larry Masinter,et al.  An algorithm for the construction of the graphs of organic molecules , 1974, Discret. Math..

[25]  Anthony C. Hearn,et al.  REDUCE 2: A system and language for algebraic manipulation , 1971, SYMSAC '71.

[26]  Tomas H. Varkony,et al.  Computer-Assisted Simulation of Chemical Reaction Sequences. Applications to Problems of Structure Elucidation , 1978, J. Chem. Inf. Comput. Sci..

[27]  J. BRONOWSKI The Logic of the Mind , 1966, Nature.

[28]  G. Milne,et al.  A computer-based chemical information system. , 1977, Science.

[29]  Richard M. Friedberg,et al.  A Learning Machine: Part I , 1958, IBM J. Res. Dev..

[30]  James G. Nourse Generalized stereoisomerization modes , 1977 .

[31]  Joshua Lederberg,et al.  A Heuristic Program for Solving a Scientific Inference Problem: Summary of Motivation and Implementation , 1970 .

[32]  N. F. Bunnin,et al.  Artificial Intelligence and Natural Man , 1980 .

[33]  D. H. Smith,et al.  Applications of artificial intelligence for chemical inference. 8. An approach to the computer interpretation of the high resolution mass spectra of complex molecules. Structure elucidation of estrogenic steroids. , 1972, Journal of the American Chemical Society.

[34]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[35]  Tom Michael Mitchell Version spaces: an approach to concept learning. , 1979 .

[36]  J. Lederberg Computation of molecular formulas for mass spectrometry , 1964 .

[37]  J. Lederberg Systematics of Organic Molecules, Graph Topology, and Hamilton Circuits , 1966 .

[38]  J. Lederberg Topology of Molecules , 1969 .

[39]  Larry Masinter,et al.  Constructive graph labeling using double cosets , 1974, Discret. Math..

[40]  Joshua Lederberg,et al.  Online Computation of Molecular Formulas from Mass Number , 1966 .

[41]  William L. Fitch,et al.  Analysis of adsorption properties and adsorbed species on commercial polymeric carbons , 1979 .

[42]  Allen Newell,et al.  Production Systems: Models of Control Structures , 1973 .

[43]  Raymond E. Carhart,et al.  Structure Elucidation Based on Computer Analysis of High and Low Resolution Mass Spectral Data , 1978 .

[44]  Joshua Lederberg,et al.  Mechanization of inductive inference in organic chemistry. , 1967 .

[45]  Dennis H. Smith Applications of artificial intelligence for chemical inference. XV. Constructive graph labeling applied to chemical problems. Chlorinated hydrocarbons , 1975 .

[46]  N. S. Sridharan,et al.  Computer Generation of Vertex Graphs , 1974, Inf. Process. Lett..

[47]  E. B. James,et al.  God and Golem, Inc. , 1966 .

[48]  James G. Nourse,et al.  Application of the Permutation Group to Stereoisomer Generation for Computer Assisted Structure Elucidation , 1979 .

[49]  James G. Nourse,et al.  APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE. 29. EXHAUSTIVE GENERATION OF STEREOISOMERS FOR STRUCTURE ELUCIDATION , 1979 .

[50]  William L. Fitch,et al.  Characterization of carbon black adsorbates and artifacts formed during extraction , 1978 .