Data Mining in the Clinical Research Environment

Data mining has had wide adoption in recent years in many industries, largely because of the ability of mining techniques to rapidly yield answers to business questions in a short time and the availability of large quantities of data to exploit. This paper will discuss the topic of data and text mining in general, before focusing on applications in the clinical research field. Of particular interest is the application of mining techniques to signal detection for adverse events. The value of these techniques is discussed, along with the context in which data and text mining appear in the overall architecture of a SAS solution for pharmacovigilance. WHAT IS DATA MINING? Data mining is defined by SAS as the process of selecting, exploring, and modelling large amounts of data to uncover previously unknown patterns for business advantage. To expand on this in detail, it is important to realise that data mining is a continuous process where models are built, refined and managed over a period of time. The techniques used are largely iterative and empirical in nature, which implies a continuous process. Several different techniques are employed to gain value from the data, including graphical exploration and many different modelling and modification techniques; data mining is not the same as data exploration. Data volumes are generally very large, as data mining techniques are generally applied to circumstances where the problem is not well understood and traditional parametric statistics have either failed or not been applied because of the complexity of the situation. Data mining is also often applied where the problem statement cannot be easily stated, and where a hypothesis needs to be generated. For example the question could be “what significant associations exist between items in a typical shopping basket?” This might then lead to a question such as “do people that buy nappies also buy beer at the same time most of the time?” (This is apparently true!).

[1]  Ian Witten,et al.  Data Mining , 2000 .