MVApp—Multivariate Analysis Application for Streamlined Data Analysis and Curation1[OPEN]

MVApp offers a free and collaborative platform for streamlined curation and analysis of plant phenotyping datasets. Modern phenotyping techniques yield vast amounts of data that are challenging to manage and analyze. When thoroughly examined, this type of data can reveal genotype-to-phenotype relationships and meaningful connections among individual traits. However, efficient data mining is challenging for experimental biologists with limited training in curating, integrating, and exploring complex datasets. Additionally, data transparency, accessibility, and reproducibility are important considerations for scientific publication. The need for a streamlined, user-friendly pipeline for advanced phenotypic data analysis is pressing. In this article we present an open-source, online platform for multivariate analysis (MVApp), which serves as an interactive pipeline for data curation, in-depth analysis, and customized visualization. MVApp builds on the available R-packages and adds extra functionalities to enhance the interpretability of the results. The modular design of the MVApp allows for flexible analysis of various data structures and includes tools underexplored in phenotypic data analysis, such as clustering and quantile regression. MVApp aims to enhance findable, accessible, interoperable, and reproducible data transparency, streamline data curation and analysis, and increase statistical literacy among the scientific community.

[1]  R. Fritsche‐Neto,et al.  Be-Breeder: an R/Shiny application for phenotypic data analyses in plant breeding. , 2018 .

[2]  Shuhui Song,et al.  Comparative metabolomic analysis reveals a reactive oxygen species-dominated dynamic model underlying chilling environment adaptation and tolerance in rice. , 2016, The New phytologist.

[3]  José A. Díaz-García,et al.  A note on the Cook's distance , 2004 .

[4]  Martin Krzywinski,et al.  Points of Significance: Analyzing outliers: influential or nuisance? , 2016, Nature Methods.

[5]  Hiroshi Ezura,et al.  Melonet-DB, a Grand RNA-Seq Gene Expression Atlas in Melon (Cucumis melo L.) , 2018, Plant & cell physiology.

[6]  S. Ratcliffe,et al.  GEEQBOX: A MATLAB Toolbox for Generalized Estimating Equations and Quasi-Least Squares , 2008 .

[7]  M. Tyers,et al.  BoxPlotR: a web tool for generation of box plots , 2014, Nature Methods.

[8]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[9]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[10]  V. Sellam,et al.  Prediction of Crop Yield using Regression Analysis , 2016 .

[11]  T. Hothorn,et al.  Simultaneous Inference in General Parametric Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[12]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[13]  Jin Chen,et al.  Plant photosynthesis phenomics data quality control , 2015, Bioinform..

[14]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[15]  Martin Trtílek,et al.  High-Throughput Non-destructive Phenotyping of Traits that Contribute to Salinity Tolerance in Arabidopsis thaliana , 2016, Front. Plant Sci..

[16]  Gota Morota,et al.  ShinyGPAS: interactive genomic prediction accuracy simulator based on deterministic formulas , 2017, Genetics Selection Evolution.

[17]  Malia A. Gehan,et al.  Lights, camera, action: high-throughput plant phenotyping is ready for a close-up. , 2015, Current opinion in plant biology.

[18]  T. Lumley,et al.  gplots: Various R Programming Tools for Plotting Data , 2015 .

[19]  Yan Li,et al.  DEApp: an interactive web interface for differential expression analysis of next generation sequence data , 2017, Source Code for Biology and Medicine.

[20]  Brian J. Smith,et al.  boa: An R Package for MCMC Output Convergence Assessment and Posterior Inference , 2007 .

[21]  Malika Charrad,et al.  NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set , 2014 .

[22]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[23]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[24]  David S. Wishart,et al.  MetaboAnalyst 3.0—making metabolomics more meaningful , 2015, Nucleic Acids Res..

[25]  Ming Chen,et al.  The HTPmod Shiny application enables modeling and visualization of large-scale biological data , 2018, Communications Biology.

[26]  Hadley Wickham,et al.  Reshaping Data with the reshape Package , 2007 .

[27]  Naomi S. Altman,et al.  Points of Significance: Principal component analysis , 2017, Nature Methods.

[28]  K. Chaloner,et al.  A Bayesian approach to outlier detection and residual analysis , 1988 .

[29]  Jonathan W. Nelson,et al.  The START App: a web‐based RNAseq analysis and visualization resource , 2016, Bioinform..

[30]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[31]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[32]  A. Miłobędzka,et al.  Can DNA sequencing show differences between microbial communities in Polish and Danish wastewater treatment plants? , 2017, Water science and technology : a journal of the International Association on Water Pollution Research.

[33]  Leonore Reiser,et al.  FAIR: A Call to Make Published Data More Findable, Accessible, Interoperable, and Reusable. , 2018, Molecular plant.

[34]  S. Juma,et al.  On the Use of Regression Models to Predict Tea Crop Yield Responses to Climate Change: A Case of Nandi East, Sub-County of Nandi County, Kenya , 2017 .

[35]  Sébastien Lê,et al.  FactoMineR: An R Package for Multivariate Analysis , 2008 .

[36]  Bjarni J. Vilhjálmsson,et al.  GWAPP: A Web Application for Genome-Wide Association Mapping in Arabidopsis[W][OA] , 2012, Plant Cell.