Optical Structure Recognition Application Entry in Image2Structure Task

We present Optical Structure Recognition Application (OSRA) as an entry into Image2Structure task of TREC-CHEM. OSRA is an open source utility to convert images of chemical structures to connection tables in an established computerized molecular format. There exists a large body of chemical information which has remained largely inaccessible to machine data mining techniques so far. One of the most common ways of describing a chemical structure in a journal publication or a patent document is by drawing a two-dimensional structure diagram which represents atoms and bonds of the molecule in a humanrecognizable form. While easily interpreted by a human expert, such drawings are by themselves unsuitable for use in a computer database for applications such as virtual screening and computer aided drug development. OSRA allows recognition and conversion of such drawings into computer formats widely used by the chemoinformatics community.