MetaFS: Performance assessment of biomarker discovery in metaproteomics

Metaproteomics suffers from the issues of dimensionality and sparsity. Data reduction methods can maximally identify the relevant subset of significant differential features and reduce data redundancy. Feature selection (FS) methods were applied to obtain the significant differential subset. So far, a variety of feature selection methods have been developed for metaproteomic study. However, due to FS's performance depended heavily on the data characteristics of a given research, the well-suitable feature selection method must be carefully selected to obtain the reproducible differential proteins. Moreover, it is critical to evaluate the performance of each FS method according to comprehensive criteria, because the single criterion is not sufficient to reflect the overall performance of the FS method. Therefore, we developed an online tool named MetaFS, which provided 13 types of FS methods and conducted the comprehensive evaluation on the complex FS methods using four widely accepted and independent criteria. Furthermore, the function and reliability of MetaFS were systematically tested and validated via two case studies. In sum, MetaFS could be a distinguished tool for discovering the overall well-performed FS method for selecting the potential biomarkers in microbiome studies. The online tool is freely available at https://idrblab.org/metafs/.

[1]  Pietro Lió,et al.  Seeing the wood for the trees: a forest of methods for optimization and omic-network integration in metabolic modelling , 2017, Briefings Bioinform..

[2]  Xiao Guo,et al.  MetaComp: comprehensive analysis software for comparative meta-omics including comparative metagenomics , 2017, BMC Bioinformatics.

[3]  James Butcher,et al.  Metaproteomics reveals associations between microbiome and intestinal extracellular vesicle proteins in pediatric inflammatory bowel disease , 2018, Nature Communications.

[4]  L. Ein-Dor,et al.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Feng Zhu,et al.  Differentiating Physicochemical Properties between Addictive and Nonaddictive ADHD Drugs Revealed by Molecular Dynamics Simulation Studies. , 2017, ACS chemical neuroscience.

[6]  Tingting Fu,et al.  Therapeutic target database update 2018: enriched resource for facilitating bench-to-clinic research of targeted therapeutics , 2017, Nucleic Acids Res..

[7]  Alfonso Rodríguez-Patón,et al.  Meta-Path Methods for Prioritizing Candidate Disease miRNAs , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Jing Zhao,et al.  Bioinformatics tools for quantitative and functional metagenome and metatranscriptome data analysis in microbes , 2018, Briefings Bioinform..

[9]  Lixia Yao,et al.  Simultaneous Improvement in the Precision, Accuracy, and Robustness of Label-free Proteome Quantification by Optimizing Data Manipulation Chains. , 2019, Molecular & cellular proteomics : MCP.

[10]  Amedeo Napoli,et al.  Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data , 2016, Front. Mol. Biosci..

[11]  Feng Zhu,et al.  VARIDT 1.0: variability of drug transporter database , 2019, Nucleic Acids Res..

[12]  Tomi Suomi,et al.  Optimization of Statistical Methods Impact on Quantitative Proteomics Data. , 2015, Journal of proteome research.

[13]  Bo Li,et al.  NOREVA: normalization and evaluation of MS-based metabolomics data , 2017, Nucleic Acids Res..

[14]  F. Bastida,et al.  Metaproteomics of soils from semiarid environment: functional and phylogenetic information obtained with different protein extraction methods. , 2014, Journal of proteomics.

[15]  Wei Tang,et al.  Tumor origin detection with tissue‐specific miRNA and DNA methylation markers , 2018, Bioinform..

[16]  Xin Lu,et al.  A Novel Strategy for Large-Scale Metabolomics Study by Calibrating Gross and Systematic Errors in Gas Chromatography-Mass Spectrometry. , 2016, Analytical chemistry.

[17]  M. Wagner,et al.  A New Perspective on Microbes Formerly Known as Nitrite-Oxidizing Bacteria. , 2016, Trends in microbiology.

[18]  Birgit Schilling,et al.  Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. , 2010, Journal of proteome research.

[19]  Xiangrong Liu,et al.  Application of Machine Learning in Microbiology , 2019, Front. Microbiol..

[20]  Md. Nurul Haque Mollah,et al.  A New Approach of Outlier-Robust Missing Value Imputation for Metabolomics Data Analysis , 2018 .

[21]  Shannon L. Risacher,et al.  Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data , 2017, Briefings Bioinform..

[22]  Jasmine Chong,et al.  MicrobiomeAnalyst: a web-based tool for comprehensive statistical, visual and meta-analysis of microbiome data , 2017, Nucleic Acids Res..

[23]  Dieter Jahn,et al.  A Metaproteomics Approach to Elucidate Host and Pathogen Protein Expression during Catheter-Associated Urinary Tract Infections (CAUTIs) , 2015, Molecular & Cellular Proteomics.

[24]  Laura L. Elo,et al.  A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation , 2017, Briefings Bioinform..

[25]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2016, Nucleic Acids Res..

[26]  Age K Smilde,et al.  A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics* , 2012, Molecular & Cellular Proteomics.

[27]  Feng Zhu,et al.  Clinical trials, progression-speed differentiating features and swiftness rule of the innovative targets of first-in-class drugs , 2019, Briefings Bioinform..

[28]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[29]  Fei Wang,et al.  Deep learning for healthcare: review, opportunities and challenges , 2018, Briefings Bioinform..

[30]  Laura L. Elo,et al.  A systematic evaluation of normalization methods in quantitative label-free proteomics , 2016, Briefings Bioinform..

[31]  Yang Liu,et al.  Regression Analysis of ICT Impact Factors on Early Adolescents’ Reading Proficiency in Five High-Performing Countries , 2019, Frontiers in Psychology.

[32]  J. Koziol,et al.  Label-free, normalized quantification of complex mass spectrometry data for proteomics analysis , 2009, Nature Biotechnology.

[33]  Jie Hu,et al.  The Exploration of a Machine Learning Approach for the Assessment of Learning Styles changes , 2018, Mechatron. Syst. Control..

[34]  Xiaofeng Li,et al.  ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies , 2019, Briefings Bioinform..

[35]  P. Sedgwick A comparison of parametric and non-parametric statistical tests , 2015, BMJ : British Medical Journal.

[36]  Jing Yuan,et al.  Cirrhosis related functionality characteristic of the fecal microbiota as revealed by a metaproteomic approach , 2016, BMC Gastroenterology.

[37]  Xuemei Ma,et al.  Identification of Bone Metastasis-associated Genes of Gastric Cancer by Genome-wide Transcriptional Profiling , 2018, Current Bioinformatics.

[38]  S. G. Shaila,et al.  Indexing and encoding based image feature representation with bin overlapped similarity measure for CBIR applications , 2016, J. Vis. Commun. Image Represent..

[39]  Xavier Didelot,et al.  Genomic Infectious Disease Epidemiology in Partially Sampled and Ongoing Outbreaks , 2016, bioRxiv.

[40]  David S. Wishart,et al.  MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis , 2018, Nucleic Acids Res..

[41]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[42]  R. Heyer,et al.  Proteotyping of biogas plant microbiomes separates biogas plants according to process temperature and reactor type , 2016, Biotechnology for Biofuels.

[43]  Robert Heyer,et al.  Challenges and perspectives of metaproteomic data analysis. , 2017, Journal of biotechnology.

[44]  Feng Zhu,et al.  Assessing the Effectiveness of Direct Data Merging Strategy in Long-Term and Large-Scale Pharmacometabonomics , 2019, Front. Pharmacol..

[45]  Xu Zhang,et al.  iMetaLab 1.0: a web platform for metaproteomics data analysis , 2018, Bioinform..

[46]  Feng Zhu,et al.  Performance Evaluation and Online Realization of Data-driven Normalization Methods Used in LC/MS based Untargeted Metabolomics Analysis , 2016, Scientific Reports.

[47]  Elaine Holmes,et al.  Power Analysis and Sample Size Determination in Metabolic Phenotyping. , 2016, Analytical chemistry.

[48]  Mauro Fasano,et al.  Statistical analysis of proteomics data: A review on feature selection. , 2019, Journal of proteomics.

[49]  Thilo Muth,et al.  MPA Portable: A Stand-Alone Software Package for Analyzing Metaproteome Samples on the Go , 2017, Analytical chemistry.

[50]  H. Deng,et al.  Novel common variants associated with body mass index and coronary artery disease detected using a pleiotropic cFDR method. , 2017, Journal of molecular and cellular cardiology.

[51]  James Butcher,et al.  MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota , 2016, Microbiome.

[52]  Feng Zhu,et al.  Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics , 2019, Nucleic Acids Res..

[53]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2015, Nucleic Acids Res..

[54]  Feng Zhu,et al.  Discovery of the Consistently Well-Performed Analysis Chain for SWATH-MS Based Pharmacoproteomic Quantification , 2018, Front. Pharmacol..

[55]  A. Urbani,et al.  Metaproteomic investigation to assess gut microbiota shaping in newborn mice: A combined taxonomic, functional and quantitative approach. , 2019, Journal of proteomics.

[56]  Xing Chen,et al.  PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein–Protein Interactions from Protein Sequences , 2017, International journal of molecular sciences.

[57]  Murray J Cairns,et al.  Optimal consistency in microRNA expression analysis using reference-gene-based normalization. , 2015, Molecular bioSystems.

[58]  Fei Guo,et al.  Taxonomy dimension reduction for colorectal cancer prediction , 2019, Comput. Biol. Chem..

[59]  Yuhang Zhang,et al.  Determination of Genes Related to Uveitis by Utilization of the Random Walk with Restart Algorithm on a Protein–Protein Interaction Network , 2017, International journal of molecular sciences.

[60]  John Hardy,et al.  Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences , 2016, Briefings Bioinform..

[61]  Feng Zhu,et al.  What Makes Species Productive of Anti-Cancer Drugs? Clues from Drugs' Species Origin, Druglikeness, Target and Pathway. , 2018, Anti-cancer agents in medicinal chemistry.

[62]  Florian P Breitwieser,et al.  A review of methods and databases for metagenomic classification and assembly , 2019, Briefings Bioinform..

[63]  S. Pinto,et al.  Global Proteome Profiling Reveals Drug-Resistant Traits in Elizabethkingia meningoseptica: An Opportunistic Nosocomial Pathogen. , 2019, Omics : a journal of integrative biology.

[64]  Jianan Huang,et al.  Shifts in diversity and function of the bacterial community during the manufacture of Fu brick tea. , 2019, Food microbiology.

[65]  B. Bauer,et al.  Analysis of Glossina palpalis gambiensis and Glossina tachinoides from two distant locations in Burkina Faso using MALDI TOF MS , 2013, Parasitology Research.

[66]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[67]  Yasset Perez-Riverol,et al.  A multi-center study benchmarks software tools for label-free proteome quantification , 2016, Nature Biotechnology.

[68]  Jianguo Xia,et al.  Web-based inference of biological patterns, functions and pathways from metabolomic data using MetaboAnalyst , 2011, Nature Protocols.

[69]  Jana Seifert,et al.  Dietary changes in nutritional studies shape the structural and functional composition of the pigs’ fecal microbiome—from days to weeks , 2017, Microbiome.

[70]  Young-Mo Kim,et al.  Drought delays development of the sorghum root microbiome and enriches for monoderm bacteria , 2018, Proceedings of the National Academy of Sciences.

[71]  Paul M. Ruegger,et al.  Host–microbe relationships in inflammatory bowel disease detected by bacterial and metaproteomic analysis of the mucosal–luminal interface , 2012, Inflammatory bowel diseases.

[72]  Lei Wang,et al.  A Novel Approach based on Bipartite Network to Predict Human Microbe-Disease Associations , 2017 .

[73]  Feng Zhu,et al.  A critical assessment of the feature selection methods used for biomarker discovery in current metaproteomics studies , 2019, Briefings Bioinform..

[74]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[75]  Joshua E. Elias,et al.  The effect of microbial colonization on the host proteome varies by gastrointestinal location , 2015, The ISME Journal.

[76]  James E Johnson,et al.  Metaproteomic analysis using the Galaxy framework , 2015, Proteomics.

[77]  Arthur Tenenhaus,et al.  A strategy for multimodal data integration: application to biomarkers identification in spinocerebellar ataxia , 2018, Briefings Bioinform..

[78]  Yingjin Yuan,et al.  Integrated proteomic and metabolomic analysis of a reconstructed three-species microbial consortium for one-step fermentation of 2-keto-l-gulonic acid, the precursor of vitamin C , 2019, Journal of Industrial Microbiology & Biotechnology.

[79]  Stephen J. Callister,et al.  Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. , 2006, Journal of proteome research.