Gene function prediction with knowledge from gene ontology

Gene function prediction is an important problem in bioinformatics. Due to the inherent noise existing in the gene expression data, the attempt to improve the prediction accuracy resorting to new classification techniques is limited. With the emergence of Gene Ontology (GO), extra knowledge about the gene products can be extracted from GO and facilitates solving the gene function prediction problem. In this paper, we propose a new method which utilises GO information to improve the classifiers' performance in gene function prediction. Specifically, our method learns a distance metric under the supervision of the GO knowledge using the distance learning technique. Compared with the traditional distance metrics, the learned one produces a better performance and consequently classification accuracy can be improved. The effectiveness of our proposed method has been corroborated by the extensive experimental results.

[1]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[2]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[3]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[4]  Steffen Staab,et al.  Taxonomy Learning - Factoring the Structure of a Taxonomy into a Semantic Classification Decision , 2002, COLING.

[5]  Wen Wen,et al.  Kernel based gene expression pattern discovery and its application on cancer classification , 2010, Neurocomputing.

[6]  De-Shuang Huang,et al.  Cancer classification using Rotation Forest , 2008, Comput. Biol. Medicine.

[7]  Mostafa Kaveh,et al.  Inferring a Transcriptional Regulatory Network from Gene Expression Data Using Nonlinear Manifold Embedding , 2010 .

[8]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[9]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Olivier Bodenreider,et al.  An ontology-driven clustering method for supporting gene expression analysis , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[11]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[12]  Simon C. K. Shiu,et al.  Metasample-Based Sparse Representation for Tumor Classification , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[14]  Michael A. Siani-Rose,et al.  A Knowledge-Based Clustering Algorithm Driven by Gene Ontology , 2004, Journal of biopharmaceutical statistics.

[15]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[16]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[17]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[18]  Yukyee Leung,et al.  A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Lei Zhang,et al.  Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection , 2009, IEEE Transactions on Information Technology in Biomedicine.

[20]  Vipin Kumar,et al.  Incorporating functional inter-relationships into protein function prediction algorithms , 2009, BMC Bioinformatics.

[21]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[22]  Wei Jia,et al.  Robust Classification Method of Tumor Subtype by Using Correlation Filters , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Cheng Soon Ong,et al.  mGene: accurate SVM-based gene finding with an application to nematode genomes. , 2009, Genome research.

[24]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[25]  Zheng Guo,et al.  Broadly predicting specific gene functions with expression similarity and taxonomy similarity. , 2005, Gene.

[26]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[27]  Changshui Zhang,et al.  Classification of gene-expression data: The manifold-based metric learning way , 2006, Pattern Recognit..

[28]  Ying Xu,et al.  Prediction of functional modules based on comparative genome analysis and Gene Ontology application , 2005, Nucleic acids research.

[29]  J. Nevins,et al.  Mining gene expression profiles: expression signatures as cancer phenotypes , 2007, Nature Reviews Genetics.