Cognitive Automation of Data Science

This paper explores how an automated procedure may leverage domain knowledge and reasoning to further automate Machine Learning (ML) and Data Science in a manner that may be thought of as cognitive. To this end, we rst describe key features that we believe a cognitive automation system for data science must possess. The goal of a system embodying this concept would be to extend existing data-driven approaches by incorporating knowledge from experts as well as unstructured data, and performing inference on the knowledge. It would include basic concepts such as reasoning based on realizations (such as overtting) during the conguration process that results in the system performing corrective actions driven by knowledge of the underlying analytics tool. Furthermore, the system would directly incorporate end-user constraints (e.g., the wish for explainable decisions) in order to guide the learning process. While knowledge can be directly contributed by experts (e.g., known best practices in data science), the system would also extract relevant knowledge from unstructured data by employing DeepQA systems (e.g., querying Wikipedia pages) and through interactions with the user in order to support recent developments in data science and active user guidance. Finally, in the spirit of IBM Blue Chef, the system would bring the notion of creativity to automated ML by composing novel variations of existing ML techniques. The present paper discusses the main features of and challenges in building such a system.