Probabilistic, and in particular Bayesian, methods for modelling data are becoming increasingly sophisticated. This has been fuelled by the demand to analyse the enormous wealth of data being produced by the biomedical sciences. In this thesis we present a variety of unsupervised generative probabilistic models loosely based around mixtures of distributions. The motivation behind using these models is that the mixture reflects aspects of a biomedical process which has a number of contributing factors. We analyse gene expression data from microarray, sequence motif data and radiological data. We attempt to model the interactions between motif data and gene expression for yeast, and we perform in depth analysis of gene expression data for four breast cancer datasets. The radiological data comes from computed tomography scans and radiologist reports. We model the interaction between image data from scans and textual data from reports for a number of lung diseases. A common theme throughout this thesis is data fusion: this can be the joint modelling of two separate datasets, comparison of equivalent data sets from independent sources or simply the incorporation of external information into the model.
[1]
W. K. Hastings,et al.
Monte Carlo Sampling Methods Using Markov Chains and Their Applications
,
1970
.
[2]
Vladimir N. Vapnik,et al.
The Nature of Statistical Learning Theory
,
2000,
Statistics for Engineering and Information Science.
[3]
J. Foekens,et al.
Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer
,
2005,
The Lancet.
[4]
M. Lichinitser,et al.
Expression of Flt-1 and Flk-1 Receptors for Vascular Endothelial Growth Factor on Tumor Cells as a New Prognostic Criterion for Locally Advanced Breast Cancer
,
2003,
Bulletin of Experimental Biology and Medicine.
[5]
R. Spang,et al.
Predicting the clinical status of human breast cancer by using gene expression profiles
,
2001,
Proceedings of the National Academy of Sciences of the United States of America.