Rule-Based Classification Systems for Informatics

Classification of data is an important step in the knowledge evolution of sciences. Traditionally, in sciences, classification of data was performed by human experts. Human knowledge can recognize unique functional properties that are necessary and sufficient to place complex structures and phenomena into a particular class or group. However, with the growth in scientific data and rapid changes in knowledge, it is no longer feasible for humans to classify objects. Automation of the classification process is necessary to cope with the growing amount of data. Otherwise, classification will become the rate-limiting step for scientific data analysis.In this paper, we address the needs of such automation in the SciAEther project and develop ChES, a fast and reproducible framework for classifying molecules in chemical data. Our framework captures human understanding through an ontology and the diversity in classification types through a rule based system to classify complex molecular compounds. We have tested our system with molecules from PubChem repository and found that our knowledge-based, automatic classification matches, and sometimes surpasses, that of the human experts.