Data Mining and Knowledge Discovery in Sediment Transport

The means for data collection have never been as advanced as they are today. Moreover, the numerical models we use today have never been so advanced. Feeding and calibrating models against collected measurements, however, represents only a one-way flow: from measurements to the model. The observations of the system can be analyzed further in the search for the information they encode. Such automated search for models accurately describing data constitutes a new direction that can be identified as that of data mining. It can be expected that in the years to come we shall concentrate our efforts more and more on the analysis of the data we acquire from natural or artificial sources and that we shall mine for knowledge from the data so acquired. Data mining and knowledge discovery aim at providing tools to facilitate the conversion of data into a number of forms, such as equations, that provide a better understanding of the process generating or producing these data. These new models combined with the already available understanding of the physical processes—the theory—result in an improved understanding and novel formulations of physical laws and improved predictive capability. This article describes the data mining process in general, as well as an application of a data mining technique in the domain of sediment transport. Data related to the concentration of suspended sediment near a bed are analyzed by the means of genetic programming. Machine-induced relationships are compared against formulations proposed by human experts and are discussed in terms of accuracy and physical interpretability.