Overcoming analytical reliability issues in clinical proteomics using rank-based network approaches

Proteomics is poised to play critical roles in clinical research. However, due to limited coverage and high noise, integration with powerful analysis algorithms is necessary. In particular, network-based algorithms can improve selection of reproducible features in spite of incomplete proteome coverage, technical inconsistency or high inter-sample variability. We define analytical reliability on three benchmarks --- precision/recall rates, feature-selection stability and cross-validation accuracy. Using these, we demonstrate the insufficiencies of commonly used Student’s t-test and Hypergeometric enrichment. Given advances in sample sizes, quantitation accuracy and coverage, we are now able to introduce and evaluate Ranked-Based Network Approaches (RBNAs) for the first time in proteomics. These include SNET (SubNETwork), FSNET (FuzzySNET), PFSNET (PairedFSNET). We also introduce for the first time, PPFSNET(samplePairedPFSNET), which is a paired-sample variant of PFSNET. RBNAs (particularly PFSNET and PPFSNET) excelled on all three benchmarks and can make consistent and reproducible predictions even in the small-sample size scenario (n=4). Given these qualities, RBNAs represent an important advancement in network biology, and is expected to see practical usage, particularly in clinical biomarker and drug target prediction.

[1]  Zhenhua Li,et al.  A quantum leap in the reproducibility, precision, and sensitivity of gene expression profile analysis even when sample size is extremely small , 2015, J. Bioinform. Comput. Biol..

[2]  R. Aebersold,et al.  Mass spectrometry-based proteomics and network biology. , 2012, Annual review of biochemistry.

[3]  L. Wong,et al.  Contemporary Network Proteomics and Its Requirements , 2013, Biology.

[4]  Limsoon Wong,et al.  Finding consistent disease subnetworks using PFSNet , 2014, Bioinform..

[5]  D. Curran‐Everett,et al.  The fickle P value generates irreproducible results , 2015, Nature Methods.

[6]  David Venet,et al.  Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome , 2011, PLoS Comput. Biol..

[7]  T. Raju William Sealy Gosset and William A. Silverman: Two “Students” of Science , 2005, Pediatrics.

[8]  L. Wong,et al.  Proteomics Signature Profiling (PSP): A Novel Contextualization Approach for Cancer Proteomics , 2012, Journal of proteome research.

[9]  Limsoon Wong,et al.  A network-based maximum link approach towards MS identifies potentially important roles for undetected ARRB1/2 and ACTB in liver cancer progression , 2012, Int. J. Bioinform. Res. Appl..

[10]  Eric W. Deutsch,et al.  A repository of assays to quantify 10,000 human proteins by SWATH-MS , 2014, Scientific Data.

[11]  Ludovic C. Gillet,et al.  Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps , 2015, Nature Medicine.

[12]  L. Wong,et al.  Enhancing the utility of Proteomics Signature Profiling (PSP) with Pathway Derived Subnets (PDSs), performance analysis and specialised ontologies , 2013, BMC Genomics.

[13]  Bing Zhang,et al.  Network-assisted protein identification and data interpretation in shotgun proteomics , 2009, Molecular systems biology.

[14]  Prasad Patil,et al.  Test set bias affects reproducibility of gene signatures , 2015, Bioinform..

[15]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[16]  Tao Xu,et al.  Bioinformatics Applications Note Sequence Analysis Xdia: Improving on the Label-free Data-independent Analysis , 2022 .

[17]  Ramdzan M. Zubaidah,et al.  Network-Based Pipeline for Analyzing MS Data: An Application toward Liver Cancer , 2011, Journal of proteome research.

[18]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[19]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[20]  L. Wong,et al.  Computational proteomics: designing a comprehensive analytical strategy. , 2014, Drug discovery today.

[21]  Yike Guo,et al.  Finding consistent disease subnetworks across microarray datasets , 2011, BMC Bioinformatics.

[22]  L. Wong,et al.  Comparative Network-Based Recovery Analysis and Proteomic Profiling of Neurological Changes in Valproic Acid-Treated Mice , 2013, Journal of proteome research.

[23]  R. Aebersold,et al.  Applying mass spectrometry-based proteomics to genetics, genomics and network biology , 2009, Nature Reviews Genetics.

[24]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[25]  L. Wong,et al.  Networks in proteomics analysis of cancer. , 2013, Current opinion in biotechnology.

[26]  John D. Venable,et al.  Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra , 2004, Nature Methods.

[27]  Limsoon Wong,et al.  How Advancement in Biological Network Analysis Methods Empowers Proteomics , 2022 .