Special Issue on Discovery Science

Discovery Science (DS) is a research discipline that is concerned with the development, analysis and application of computational methods and tools to support the automatic or semi-automatic discovery of knowledge in scientific fields such as medicine, the natural sciences and the social sciences. To this end, DS makes use of theory, methods and techniques coming from various fields of computer science and applied mathematics, notably algorithms and complexity, machine learning and data mining, intelligent data analysis, statistics, optimization, and databases. Contrary to conventional statistical analysis, which uses data to verify the validity of various hypotheses, discovery science focuses on the discovery of the hypotheses themselves. As such, it puts particular emphasis on increasing our understanding of the process of hypothesis formation. In terms of applications, discovery science is more focused on the analysis of scientific data originating from various disciplines, as opposed to a stronger commercial focus of many data mining conferences and journal. We are convinced that a special issue on discovery science is particularly interesting for the readership of Information Sciences, because of the broad interdisciplinary scope that is shared by both fields. The works presented in this special issue should not only be of interest to computer scientists as the providers of the computational discovery techniques, but also to experts in various application domains that are able to contribute to and judge the value of discoveries in their respective areas, so that this special issue on discovery science in this journal will hopefully help to strengthen the connection between these fields. In our open call for contributions, we solicited submissions in all areas of discovery science. We received an unprecedented number of 50 submissions, which shows the liveliness of this field. Unfortunately, this also put us into unforeseen editorial difficulties, which caused a significant delay in our publication schedule— we take this opportunity to apologize for this delay to all authors. Of the received 50 submissions, we eventually selected 9 for the inclusion in this special issue. These articles are briefly summarized below. Improving bag-of-visual-words image retrieval with predictive clustering trees by Ivica Dimitrovski, Dragi Kocev, Suzana Loskovska, and Sašo Dvzeroski shows an application of a state-of-the-art machine learning technique for the task of image retrieval. Unlike decision trees, which group instances with respect to their target variable, and clustering trees, which group instances with respect to their closeness in input space, predictive clustering trees take into account both aspects for finding a good division of the input space. In this paper, it is shown that ensembles of such trees can be favorably used for constructing a representation language for images, which can then be used for image retrieval. Experiments on various image databases demonstrate that the proposed system outperforms a baseline system that uses conventional k-means clustering for this approach, without sacrificing image retrieval efficiency and scalability. Characterizing facial expressions by grammars of action unit sequences—a first investigation using ABL by Michael Siebers, Ute Schmid, Dominik Seuß, Miriam Kunz, and Stefan Lautenbacher tackles a very specific image analysis task, namely the analysis of facial expressions with respect to the emotions—in this case pain—of their bearer. The basis of analysis are so-called action units, which are facial primitives such as wrinkled noses, or depressed lip corner. While previous approaches learn to associate emotions with sets of such action units, this paper proposes the use of grammar induction for the analysis of sequences of these primitives. The paper then also investigates strategies for compactification of the learned grammar rules, and investigates the trade-off between the number of rules and their performance. Extracting opinionated (sub)features from a stream of product reviews using accumulated novelty and internal hierarchy reorganization by Max Zimmermann, Eirini Ntoutsi, and Myra Spiliopoulou present a somewhat finer grained approach towards sentiment analysis, i.e., algorithms for assessing whether a given piece of text has a positive or negative attitude. Their work intends to automatically assess whether the polarity of a product review is positive or negative towards individual features of the product. The product features are not given, but are also automatically discovered in a stream of such product reviews using