Research prioritization through prediction of future impact on biomedical science: a position paper on inference-analytics

BackgroundAdvances in biotechnology have created “big-data” situations in molecular and cellular biology. Several sophisticated algorithms have been developed that process big data to generate hundreds of biomedical hypotheses (or predictions). The bottleneck to translating this large number of biological hypotheses is that each of them needs to be studied by experimentation for interpreting its functional significance. Even when the predictions are estimated to be very accurate, from a biologist’s perspective, the choice of which of these predictions is to be studied further is made based on factors like availability of reagents and resources and the possibility of formulating some reasonable hypothesis about its biological relevance. When viewed from a global perspective, say from that of a federal funding agency, ideally the choice of which prediction should be studied would be made based on which of them can make the most translational impact.ResultsWe propose that algorithms be developed to identify which of the computationally generated hypotheses have potential for high translational impact; this way, funding agencies and scientific community can invest resources and drive the research based on a global view of biomedical impact without being deterred by local view of feasibility. In short, data-analytic algorithms analyze big-data and generate hypotheses; in contrast, the proposed inference-analytic algorithms analyze these hypotheses and rank them by predicted biological impact. We demonstrate this through the development of an algorithm to predict biomedical impact of protein-protein interactions (PPIs) which is estimated by the number of future publications that cite the paper which originally reported the PPI.ConclusionsThis position paper describes a new computational problem that is relevant in the era of big-data and discusses the challenges that exist in studying this problem, highlighting the need for the scientific community to engage in this line of research. The proposed class of algorithms, namely inference-analytic algorithms, is necessary to ensure that resources are invested in translating those computational outcomes that promise maximum biological impact. Application of this concept to predict biomedical impact of PPIs illustrates not only the concept, but also the challenges in designing these algorithms.

[1]  Ulrik Brandes,et al.  On variants of shortest-path betweenness centrality and their generic computation , 2008, Soc. Networks.

[2]  Geoffrey J. Barton,et al.  Probabilistic prediction and ranking of human protein-protein interactions , 2007, BMC Bioinformatics.

[3]  Thomas Lengauer,et al.  Computational analysis of human protein interaction networks , 2007, Proteomics.

[4]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[5]  J. Boonstra,et al.  The EGF receptor is an actin-binding protein , 1992, The Journal of cell biology.

[6]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[7]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[8]  Yanjun Qi,et al.  Systematic prediction of human membrane receptor interactions , 2009, Proteomics.

[9]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[10]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[11]  Lin Gao,et al.  Biological network analysis: insights into structure and functions. , 2012, Briefings in functional genomics.

[12]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[13]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[14]  J. Hirschhorn Genomewide association studies--illuminating biologic pathways. , 2009, The New England journal of medicine.

[15]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[16]  Concha Bielza,et al.  Predicting citation count of Bioinformatics papers within four years of publication , 2009, Bioinform..

[17]  Michael Kohl,et al.  Cytoscape: software for visualization and analysis of biological networks. , 2011, Methods in molecular biology.

[18]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[19]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[20]  Lutz Bornmann,et al.  What do citation counts measure? A review of studies on citing behavior , 2008, J. Documentation.

[21]  S. Lovell,et al.  Protein-protein interaction networks and biology—what's the connection? , 2008, Nature Biotechnology.

[22]  Naoki Orii,et al.  Wiki-Pi: A Web-Server of Annotated Human Protein-Protein Interactions to Aid in Discovery of Protein Function , 2012, PloS one.

[23]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[24]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[25]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[26]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[27]  Vili Podgorelec,et al.  Decision trees , 2018, Encyclopedia of Database Systems.

[28]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[29]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Jennifer M. Rust,et al.  The BioGRID Interaction Database , 2011 .

[32]  O Mason,et al.  Graph theory and networks in Biology. , 2006, IET systems biology.

[33]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[34]  J. Tavernier,et al.  Protein-protein interactions: network analysis and applications in drug discovery. , 2012, Current pharmaceutical design.

[35]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[36]  N. Chandra,et al.  Research , 2000, Veterinary Record.

[37]  M. Gerstein,et al.  Genomic analysis of essentiality within protein networks. , 2004, Trends in genetics : TIG.

[38]  Hyeong Jun An,et al.  Estimating the size of the human interactome , 2008, Proceedings of the National Academy of Sciences.

[39]  Zhiwei Wang,et al.  Proof of Concept: Network and Systems Biology Approaches Aid in the Discovery of Potent Anticancer Drug Combinations , 2010, Molecular Cancer Therapeutics.

[40]  T. Cui,et al.  Uncovering new signaling proteins and potential drug targets through the interactome analysis of Mycobacterium tuberculosis , 2009, BMC Genomics.

[41]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[42]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[43]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[44]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases , 2007, PLoS Comput. Biol..

[45]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[46]  Lawrence D. Fu,et al.  Models for Predicting and Explaining Citation Count of Biomedical Articles , 2008, AMIA.