BIOINFORMATICS Novel overlapping subgraph clustering for the detection of epitopes from an antigen

Motivation: Antigens that contain overlapping epitopes have been reported occasionally. As current algorithms mainly take a one-antigen-one-epitope approach to the prediction of epitopes, they are not capable of detecting these multiple and overlapping epitopes accurately, or even those multiple and separated epitopes existing in some other antigens. Results: We introduce a novel subgraph clustering algorithm for more accurate detection of epitopes. This algorithm takes graph partitions as seeds, and expands the seeds to merge overlapping subgraphs based on the term frequency-inverse document frequency (TF-IDF) featured similarity. Then, the merged subgraphs are each classified as an epitope or non-epitope. Tests of our algorithm were conducted on three newly collected datasets of antigens. In the first dataset, each antigen contains only a single epitope; in the second, each antigen contains only multiple and separated epitopes; and in the third, each antigen contains overlapping epitopes. The prediction performance of our algorithm is significantly better than the state-of-art methods. The lifts of the averaged f-scores on top of the best existing methods are 60%, 75%, and 22% for the single epitope detection, the multiple and separated epitopes detection, and the overlapping epitopes detection, respectively.

[1]  Bjoern Peters,et al.  BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes , 2017, Nucleic Acids Res..

[2]  Adam Miles,et al.  Antibodies Targeting Closely Adjacent or Minimally Overlapping Epitopes Can Displace One Another , 2017, PloS one.

[3]  Xingyi Zhang,et al.  Overlapping Community Detection based on Network Decomposition , 2016, Scientific Reports.

[4]  Jean-Christophe Nebel,et al.  Progress and challenges in predicting protein interfaces , 2015, Briefings Bioinform..

[5]  Yanay Ofran,et al.  Antibody specific epitope prediction-emergence of a new paradigm. , 2015, Current opinion in virology.

[6]  Yanay Ofran,et al.  PEASE: predicting B-cell epitopes utilizing antibody sequence , 2014, Bioinform..

[7]  Sreeurpa Ray,et al.  The Cell: A Molecular Approach , 1996 .

[8]  Clara Pizzuti,et al.  Overlapping Community Discovery Methods: A Survey , 2014, Social Networks: Analysis and Case Studies.

[9]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[10]  Wei Zhang,et al.  SEPPA 2.0—more refined server to predict spatial epitope considering species of immune host and subcellular localization of protein antigen , 2014, Nucleic Acids Res..

[11]  Jiye Shi,et al.  Improving B-cell epitope prediction and its application to global antibody-antigen docking , 2014, Bioinform..

[12]  Yanay Ofran,et al.  Using a combined computational-experimental approach to predict antibody-specific B cell epitopes. , 2014, Structure.

[13]  Sébastien Brier,et al.  Two cross‐reactive monoclonal antibodies recognize overlapping epitopes on Neisseria meningitidis factor H binding protein but have different functional properties , 2014, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[14]  Hung T. Nguyen,et al.  Coupling Graphs, Efficient Algorithms and B-Cell Epitope Prediction , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Jinyan Li,et al.  Structural and Functional Analysis of Multi-Interface Domains , 2012, PloS one.

[16]  Jinyan Li,et al.  B-cell epitope prediction through a graph model , 2012, BMC Bioinformatics.

[17]  Morten Nielsen,et al.  Reliable B Cell Epitope Predictions: Impacts of Method Development and Improved Benchmarking , 2012, PLoS Comput. Biol..

[18]  Jinyan Li,et al.  Antibody-Specified B-Cell Epitope Prediction in Line with the Principle of Context-Awareness , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[21]  Borivoj Vojtesek,et al.  A Multiprotein Binding Interface in an Intrinsically Disordered Region of the Tumor Suppressor Protein Interferon Regulatory Factor-1* , 2011, The Journal of Biological Chemistry.

[22]  Malik Magdon-Ismail,et al.  Finding Overlapping Communities in Social Networks , 2010, 2010 IEEE Second International Conference on Social Computing.

[23]  Jinyan Li,et al.  Mining for the antibody-antigen interacting associations that predict the B cell epitopes , 2010, BMC Structural Biology.

[24]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[25]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[26]  Wei Li,et al.  ElliPro: a new structure-based tool for the prediction of antibody epitopes , 2008, BMC Bioinformatics.

[27]  Pierre Baldi,et al.  PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure , 2008, Bioinform..

[28]  Vasant Honavar,et al.  Predicting flexible length linear B-cell epitopes. , 2008, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[29]  P. Bourne,et al.  Antibody-protein interactions: benchmark datasets and prediction tools evaluation , 2007 .

[30]  Avner Schlessinger,et al.  Towards a consensus on datasets and evaluation metrics for developing B‐cell epitope prediction tools , 2007, Journal of molecular recognition : JMR.

[31]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[32]  Morten Nielsen,et al.  Improved method for predicting linear B-cell epitopes , 2006, Immunome research.

[33]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[34]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[35]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[36]  Wei Wang,et al.  Mining protein family specific residue packing patterns from protein structure graphs , 2004, RECOMB.

[37]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[38]  M. Newman Assortative mixing in networks. , 2002, Physical review letters.

[39]  S. Dongen Graph clustering by flow simulation , 2000 .

[40]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[41]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[42]  A. Denman Cellular and Molecular Immunology , 1992 .

[43]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[44]  L. Asz Random Walks on Graphs: a Survey , 2022 .