Matrix Factorization-Based Data Fusion for Gene Function Prediction in Baker's Yeast and Slime Mold

The development of effective methods for the characterization of gene functions that are able to combine diverse data sources in a sound and easily-extendible way is an important goal in computational biology. We have previously developed a general matrix factorization-based data fusion approach for gene function prediction. In this manuscript, we show that this data fusion approach can be applied to gene function prediction and that it can fuse various heterogeneous data sources, such as gene expression profiles, known protein annotations, interaction and literature data. The fusion is achieved by simultaneous matrix tri-factorization that shares matrix factors between sources. We demonstrate the effectiveness of the approach by evaluating its performance on predicting ontological annotations in slime mold D. discoideum and on recognizing proteins of baker's yeast S. cerevisiae that participate in the ribosome or are located in the cell membrane. Our approach achieves predictive performance comparable to that of the state-of-the-art kernel-based data fusion, but requires fewer data preprocessing steps.

[1]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[2]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[3]  B. Garcia,et al.  Proteomics , 2011, Journal of biomedicine & biotechnology.

[4]  Nature Genetics , 1991, Nature.

[5]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[6]  BMC Bioinformatics , 2005 .

[7]  Blaž Zupan,et al.  Matrix factorization-based data fusion for drug-induced liver injury prediction , 2014 .

[8]  Victor H Hernandez,et al.  Nature Methods , 2007 .