Discovering Hidden Pathways in Bioinformatics

The elucidation of biological networks regulating the metabolic basis of disease is critical for understanding disease progression and in identifying therapeutic targets. In molecular biology, this process often starts by clustering expression profiles which are candidates for disease phenotypes. However, each cluster may comprise several overlapping processes that are active in the cluster. This paper outlines empirical results using methods for blind source separation to map the pathways of biomarkers driving independent, hidden processes that underpin the clusters. The method is applied to a protein expression data set measured in tissue from breast cancer patients (n=1,076).

[1]  Fabian J. Theis,et al.  Analyzing M-CSF dependent monocyte/macrophage differentiation: Expression modes and meta-modes derived from an independent component analysis , 2008, BMC Bioinformatics.

[2]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[4]  Huai Li,et al.  Unraveling transcriptional regulatory programs by integrative analysis of microarray and transcription factor binding data , 2008, Bioinform..

[5]  Paulo J. G. Lisboa,et al.  Clustering of protein expression data: a benchmark of statistical and neural approaches , 2011, Soft Comput..

[6]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[7]  Zaher Dawy,et al.  An approximation to the distribution of finite sample size mutual information estimates , 2005, IEEE International Conference on Communications, 2005. ICC 2005. 2005.

[8]  G. Ball,et al.  High‐throughput protein expression analysis using tissue microarray technology of a large well‐characterised series identifies biologically distinct classes of breast cancer confirming recent cDNA expression analyses , 2005, International journal of cancer.

[9]  Russell Schwartz,et al.  Applying unmixing to gene expression data for tumor phylogeny inference , 2010, BMC Bioinformatics.

[10]  Paulo J. G. Lisboa,et al.  Cluster-based visualisation with scatter matrices , 2008, Pattern Recognit. Lett..

[11]  Paulo J. G. Lisboa,et al.  A methodology to identify consensus classes from clustering algorithms applied to immunohistochemical data from breast cancer patients , 2010, Comput. Biol. Medicine.

[12]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[13]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[14]  Yu Sun,et al.  The discovery of transcriptional modules by a two-stage matrix decomposition approach , 2007, Bioinform..