Enhancing gene co-expression network inference for the malaria parasite Plasmodium falciparum

Background Malaria results in more than 550,000 deaths each year due to drug resistance in the most lethal Plasmodium (P) species P. falciparum. A full P. falciparum genome was published in 2002, yet 44.6% of its genes have unknown functions. Improving functional annotation of genes is important for identifying drug targets and understanding the evolution of drug resistance. Results Genes function by interacting with one another. So, analyzing gene co-expression networks can enhance functional annotations and prioritize genes for wet lab validation. Earlier efforts to build gene co-expression networks in P. falciparum have been limited to a single network inference method or gaining biological understanding for only a single gene and its interacting partners. Here, we explore multiple inference methods and aim to systematically predict functional annotations for all P. falciparum genes. We evaluate each inferred network based on how well it predicts existing gene-Gene Ontology (GO) term annotations using network clustering and leave-one-out cross-validation. We assess overlaps of the different networks’ edges (gene co-expression relationships) as well as predicted functional knowledge. The networks’ edges are overall complementary: 47%-85% of all edges are unique to each network. In terms of accuracy of predicting gene functional annotations, all networks yield relatively high precision (as high as 87% for the network inferred using mutual information), but the highest recall reached is below 15%. All networks having low recall means that none of them capture a large amount of all existing gene-GO term annotations. In fact, their annotation predictions are highly complementary, with the largest pairwise overlap of only 27%. We provide ranked lists of inferred gene-gene interactions and predicted gene-GO term annotations for future use and wet lab validation by the malaria community. Conclusions The different networks seem to capture different aspects of the P. falciparum biology in terms of both inferred interactions and predicted gene functional annotations. Thus, relying on a single network inference method should be avoided when possible. Availability and implementation All data and code are available at https://nd.edu/~cone/pfalGCEN/.

[1]  H. Schulenburg,et al.  Gene sharing among plasmids and chromosomes reveals barriers for antibiotic resistance gene transfer , 2021, Philosophical Transactions of the Royal Society B.

[2]  Tijana Milenkovic,et al.  Improved supervised prediction of aging-related genes via weighted dynamic network analysis , 2021, BMC Bioinform..

[3]  Matthew G. Johnson,et al.  Gene-rich UV sex chromosomes harbor conserved regulators of sexual development , 2021, Science Advances.

[4]  J. Juliano,et al.  Antimalarial Drug Resistance and Implications for the WHO Global Technical Strategy , 2021, Current Epidemiology Reports.

[5]  J. Rayner,et al.  Essential Genes of the Parasitic Apicomplexa. , 2021, Trends in parasitology.

[6]  Nadezhda T. Doncheva,et al.  The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets , 2020, Nucleic Acids Res..

[7]  Suhail Ahmad,et al.  Decreasing trend of imported malaria cases but increasing influx of mixed P. falciparum and P. vivax infections in malaria-free Kuwait , 2020, PloS one.

[8]  Gary D Bader,et al.  Functional genomic landscape of cancer-intrinsic evasion of killing by T cells , 2020, Nature.

[9]  Caiyan Jia,et al.  Integrated network analysis of symptom clusters across disease conditions , 2020, J. Biomed. Informatics.

[10]  M. Meissner,et al.  Endocytosis in Plasmodium and Toxoplasma Parasites. , 2020, Trends in parasitology.

[11]  Mark C. Field,et al.  The Plasmodium falciparum Artemisinin Susceptibility-Associated AP-2 Adaptin μ Subunit is Clathrin Independent and Essential for Schizont Maturation , 2020, mBio.

[12]  Gerhard Nahler,et al.  Pearson Correlation Coefficient , 2020, Definitions.

[13]  B. Bergmann,et al.  A Kelch13-defined endocytosis pathway mediates artemisinin resistance in malaria parasites , 2020, Science.

[14]  Olivier Lichtarge,et al.  Discovery of disease- and drug-specific pathways through community structures of a literature network , 2019, Bioinform..

[15]  Yaming Cao,et al.  Drinking water and sanitation conditions are associated with the risk of malaria among children under five years old in sub-Saharan Africa: A logistic regression model analysis of national survey data , 2019, Journal of advanced research.

[16]  Ryan L. Collins,et al.  The mutational constraint spectrum quantified from variation in 141,456 humans , 2020, Nature.

[17]  G. Pandey,et al.  NeTFactor, a framework for identifying transcriptional regulators of gene expression-based biomarkers , 2019, Scientific Reports.

[18]  Tijana Milenkovi'c,et al.  Supervised prediction of aging-related genes from a context-specific protein interaction subnetwork , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[19]  Q. Tan,et al.  Malaria.tools—comparative genomic and transcriptomic database for Plasmodium species , 2019, bioRxiv.

[20]  Su Yun Kang,et al.  Mapping the global prevalence, incidence, and mortality of Plasmodium falciparum, 2000–17: a spatial and temporal modelling study , 2019, The Lancet.

[21]  A. Vcev,et al.  Malaria: The Past and the Present , 2019, Microorganisms.

[22]  Edoardo Saccenti,et al.  Simulation and Reconstruction of Metabolite-Metabolite Association Networks Using a Metabolic Dynamic Model and Correlation Based Algorithms. , 2019, Journal of proteome research.

[23]  Petko Bogdanov,et al.  LARC: Learning Activity-Regularized Overlapping Communities Across Time , 2018, KDD.

[24]  E. Ashley,et al.  Malaria , 2018, The Lancet.

[25]  Ken Chen,et al.  Systematic Functional Annotation of Somatic Mutations in Cancer. , 2018, Cancer cell.

[26]  João Pedro de Magalhães,et al.  Gene co-expression analysis for functional classification and gene–disease predictions , 2017, Briefings Bioinform..

[27]  Samuel S. C. Rund,et al.  Genome-wide profiling of 24 hr diel rhythmicity in the water flea, Daphnia pulex: network analysis reveals rhythmic gene expression and enhances functional gene annotation , 2016, BMC Genomics.

[28]  Andrea Califano,et al.  ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information , 2016, Bioinform..

[29]  Jürgen Bosch,et al.  Virtual Screening and Experimental Validation Identify Novel Inhibitors of the Plasmodium falciparum Atg8–Atg3 Protein–Protein Interaction , 2016, ChemMedChem.

[30]  Jessika Weiss,et al.  Graphical Models In Applied Multivariate Statistics , 2016 .

[31]  D. Scanfeld,et al.  Genome-wide transcriptome profiling reveals functional networks involving the Plasmodium falciparum drug resistance transporters PfCRT and PfMDR1 , 2015, BMC Genomics.

[32]  Shakir Ali,et al.  Eps15 homology domain containing protein of Plasmodium falciparum (PfEHD) associates with endocytosis and vesicular trafficking towards neutral lipid storage site. , 2015, Biochimica et biophysica acta.

[33]  Liisa Holm,et al.  PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment , 2015, Bioinform..

[34]  Geoffrey H. Siwo,et al.  Predicting functional and regulatory divergence of a drug resistance transporter gene in the human malaria parasite , 2015, BMC Genomics.

[35]  Filipa L. Sousa,et al.  Origins of major archaeal clades correspond to gene acquisitions from bacteria , 2014, Nature.

[36]  Trevor J. Hastie,et al.  Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..

[37]  J. Acharya,et al.  A prospective study on adult patients of severe malaria caused by Plasmodium falciparum, Plasmodium vivax and mixed infection from Bikaner, northwest India. , 2014, Journal of vector borne diseases.

[38]  Tijana Milenkovic,et al.  Networks' characteristics are important for systems biology , 2014, Network Science.

[39]  C. Ford,et al.  Annotation of gene function in citrus using gene expression information and co-expression networks , 2014, BMC Plant Biology.

[40]  E. Álvarez-Buylla,et al.  ARACNe-based inference, using curated microarray data, of Arabidopsis thaliana root transcriptional regulatory networks , 2014, BMC Plant Biology.

[41]  D. Ferguson,et al.  The Role of Clathrin in Post-Golgi Trafficking in Toxoplasma gondii , 2013, PloS one.

[42]  Fudong Yu,et al.  Co-expression network with protein-protein interaction and transcription regulation in malaria parasite Plasmodium falciparum. , 2013, Gene.

[43]  Ulrik Brandes,et al.  What is network science? , 2013, Network Science.

[44]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[45]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[46]  Timothy Ravasi,et al.  Defining the protein interaction network of human malaria parasite Plasmodium falciparum. , 2012, Genomics.

[47]  Jeremy D. DeBarry,et al.  MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity , 2012, Nucleic acids research.

[48]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[49]  A. Rosanas-Urgell,et al.  Features and Prognosis of Severe Malaria Caused by Plasmodium falciparum, Plasmodium vivax and Mixed Plasmodium Species in Papua New Guinean Children , 2011, PloS one.

[50]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[51]  Matko Bosnjak,et al.  REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms , 2011, PloS one.

[52]  A. Bauer-Mehren,et al.  Gene-Disease Network Analysis Reveals Functional Modules in Mendelian, Complex and Environmental Diseases , 2011, PloS one.

[53]  Matthew T. Weirauch,et al.  Gene Coexpression Networks for the Analysis of DNA Microarray Data , 2011 .

[54]  Shuli Kang,et al.  Large-scale prediction of long non-coding RNA functions in a coding–non-coding gene co-expression network , 2011, Nucleic acids research.

[55]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[56]  M. Blaxter,et al.  Experimental evolution, genetic analysis and genome re-sequencing reveal the mutation conferring artemisinin resistance in an isogenic lineage of malaria parasites , 2010, BMC Genomics.

[57]  Gesine Reinert,et al.  How threshold behaviour affects the use of subgraphs for network comparison , 2010, Bioinform..

[58]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[59]  Zbynek Bozdech,et al.  Transcriptional profiling of growth perturbations of the human malaria parasite Plasmodium falciparum , 2010, Nature Biotechnology.

[60]  M. Lacerda,et al.  Malaria in Brazil: an overview , 2010, Malaria Journal.

[61]  K. Silamut,et al.  Artemisinin resistance in Plasmodium falciparum malaria. , 2009, The New England journal of medicine.

[62]  S. Krudsood,et al.  Malaria diagnosis: a brief review. , 2009, The Korean journal of parasitology.

[63]  Anne-Laure Boulesteix,et al.  Regularized estimation of large-scale gene association networks using graphical Gaussian models , 2009, BMC Bioinformatics.

[64]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[65]  S. Falcon,et al.  Hypergeometric Testing Used for Gene Set Enrichment Analysis , 2008 .

[66]  Joachim Selbig,et al.  pcaMethods - a bioconductor package providing PCA methods for incomplete data , 2007, Bioinform..

[67]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[68]  Manuel Llinás,et al.  Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy , 2007, BMC Bioinformatics.

[69]  Hongzhe Li,et al.  Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. , 2006, Biostatistics.

[70]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[71]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[72]  M. Vignali,et al.  A protein interaction network of the malaria parasite Plasmodium falciparum , 2005, Nature.

[73]  P. Mendes,et al.  The origin of correlations in metabolomics data , 2005, Metabolomics.

[74]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[75]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[76]  J. Le bras,et al.  The mechanisms of resistance to antimalarial drugs in Plasmodium falciparum , 2003, Fundamental & clinical pharmacology.

[77]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[78]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..