Disease Modeling via Large-Scale Network Analysis

Abstract : A central goal of genetics is to learn how the genotype of an organism determines its phenotype. We address the implicit problem of predicting the association of genes with phenotypes or traits. Our primary goal is to develop pragmatic data analytic methods for linking specific genes to traits and diseases, especially polygenic traits, which are the most challenging. We are also interested in developing theoretical guarantees for the methods. In the past, we have developed predictive methods general enough to apply to potentially any genetic trait, varying from plant traits relevant to desirable agricultural properties to important human diseases. Our methods, Katz on heterogeneous network and CATAPULT[1], for predicting gene-disease associations were published during the last project period in the PLOS One journal. The biological problem has also led us to pursue a significant problem in machine learning. One of the fundamental questions in machine learning relating to the classification problem is if we can efficiently learn classifiers that can provably achieve low misclassification rates in the presence of certain type of random label noise in the training data.