Connecting Seed Lists of Mammalian Proteins Using Steiner Trees

Multivariate experiments and genomics studies applied to mammalian cells often produce lists of genes or proteins altered under treatment/disease vs. control/normal conditions. Such lists can be identified in known protein-protein interaction networks to produce subnetworks that “connect” the genes or proteins from the lists. Such subnetworks are valuable for biologists since they can suggest regulatory mechanisms that are altered under different conditions. Often such subnetworks are overloaded with links and nodes resulting in connectivity diagrams that are illegible due to edge overlap. In this study, we attempt to address this problem by implementing an approximation to the Steiner Tree problem to connect seed lists of mammalian proteins/genes using literature-based protein-protein interaction networks. To avoid over-representation of hubs in the resultant Steiner Trees we assign a cost to Steiner Vertices based on their connectivity degree. We applied the algorithm to lists of genes commonly mutated in colorectal cancer to demonstrate the usefulness of this approach.