The substance module: the representation, storage, and searching of complex structures

Chemical structures are typically represented in computer programs as simple graphs, where atoms are represented by a list of nodes and the bonds by a list of nondirectional edges. While this convention allows for the representation of a large variety of chemical structures, it does not lend itself toward the representation of many common substances such as polymers, nonstoichiometric mixtures, and formulations. An extension to this convention has been developed which allows properties to be identified with a defined subgraph in a structure, an Sgroup. This extension has been implemented in the Substance Module, a new MACCS-11 module, and is used to represent and search a much broader class of chemical substances. A description of the new representation and searching capabilities is given as well as examples of its use.