Data management challenges for artificial intelligence in plant and agricultural research

Artificial Intelligence (AI) is increasingly used within plant science, yet it is far from being routinely and effectively implemented in this domain. Particularly relevant to the development of novel food and agricultural technologies is the development of validated, meaningful and usable ways to integrate, compare and visualise large, multi-dimensional datasets from different sources and scientific approaches. After a brief summary of the reasons for the interest in data science and AI within plant science, the paper identifies and discusses eight key challenges in data management that must be addressed to further unlock the potential of AI in crop and agronomic research, and particularly the application of Machine Learning (AI) which holds much promise for this domain.

[1]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[2]  Steve Cafferty,et al.  Big hitting collectors make massive and disproportionate contribution to the discovery of plant species , 2012, Proceedings of the Royal Society B: Biological Sciences.

[3]  Jonathan A Atkinson,et al.  RootNav 2.0: Deep learning for automatic navigation of complex plant root architectures , 2019, GigaScience.

[4]  David Hughes,et al.  Deep Learning for Image-Based Cassava Disease Detection , 2017, Front. Plant Sci..

[5]  Xosé M. Fernández,et al.  The 27th annual Nucleic Acids Research database issue and molecular biology database collection , 2019, Nucleic Acids Res..

[6]  Tony P. Pridmore,et al.  RootNav 2.0: Deep learning for automatic navigation of complex plant root architectures , 2019, bioRxiv.

[7]  Paul Gepts,et al.  Crop Biodiversity: An Unfinished Magnum Opus of Nature. , 2019, Annual review of plant biology.

[8]  Sabina Leonelli,et al.  The Ontologies Community of Practice: A CGIAR Initiative for Big Data in Agrifood Systems , 2020, Patterns.

[9]  Hanno Scharr,et al.  Citizen crowds and experts: observer variability in image-based plant phenotyping , 2018, Plant Methods.

[10]  Anton Güntsch,et al.  A benchmark dataset of herbarium specimen images with label data , 2019, Biodiversity data journal.

[11]  Sotirios A. Tsaftaris,et al.  Doing More With Less: A Multitask Deep Learning Approach in Plant Phenotyping , 2020, Frontiers in Plant Science.

[12]  Eva Rosenqvist,et al.  The Phenotyping Dilemma—The Challenges of a Diversified Phenotyping Community , 2019, Front. Plant Sci..

[13]  C. Fournier,et al.  High-throughput estimation of incident light, light interception and radiation-use efficiency of thousands of plants in a phenotyping platform. , 2015, The New phytologist.

[14]  Arun Kumar Sangaiah,et al.  Editorial: Machine Learning Techniques on Gene Function Prediction , 2019, Front. Genet..

[15]  Przemyslaw Prusinkiewicz,et al.  The use of plant models in deep learning: an application to leaf counting in rosette plants , 2018, Plant Methods.

[16]  Sabina Leonelli,et al.  Data management and best practice for plant science , 2017, Nature Plants.

[17]  T. Pridmore,et al.  Plant Phenomics, From Sensors to Knowledge , 2017, Current Biology.

[18]  Molly Hartzog,et al.  Life Out of Sequence: A Data-Driven History of Bioinformatics , 2014 .

[19]  Sean Bechhofer,et al.  Research Objects: Towards Exchange and Reuse of Digital Knowledge , 2010 .

[20]  Tony P. Pridmore,et al.  Deep Machine Learning provides state-of-the-art performance in image-based plant phenotyping , 2016 .

[21]  M. Ramírez Black Rice. The African Origins of Rice Cultivation in the Americas , 2014 .

[22]  E. Buckler,et al.  Deep learning for plant genomics and crop improvement. , 2020, Current opinion in plant biology.

[23]  M. García-Sancho Reordering life: knowledge and control in the genomics revolution , 2017 .

[24]  K. Kersting,et al.  Making deep neural networks right for the right scientific reasons by interacting with their explanations , 2020, Nature Machine Intelligence.

[25]  R. Tiffin,et al.  The challenges of using satellite data sets to assess historical land use change and associated greenhouse gas emissions: a case study of three Indonesian provinces , 2018, Carbon Management.

[26]  Leonore Reiser,et al.  FAIR: A Call to Make Published Data More Findable, Accessible, Interoperable, and Reusable. , 2018, Molecular plant.

[27]  S. Leonelli The challenges of big data biology , 2019, eLife.

[28]  Paul J. Kersey,et al.  COPO: a metadata platform for brokering FAIR data in the life sciences , 2019, bioRxiv.

[29]  José Crossa,et al.  Applications of Machine Learning Methods to Genomic Selection in Breeding Wheat for Rust Resistance , 2018, The plant genome.

[30]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[31]  Martina Stockhause,et al.  The TRUST Principles for digital repositories , 2020, Scientific Data.

[32]  Kaiyu Guan,et al.  Hyperspectral Leaf Reflectance as Proxy for Photosynthetic Capacities: An Ensemble Approach Based on Multiple Machine Learning Algorithms , 2019, Front. Plant Sci..

[33]  M. Semenov,et al.  Investigating the effects of inter-annual weather variation (1968–2016) on the functional response of cereal grain yield to applied nitrogen, using data from the Rothamsted Long-Term Experiments , 2020, Agricultural and forest meteorology.

[34]  Richard J. Morris,et al.  Comparative transcriptomics identifies differences in the regulation of the floral transition between Arabidopsis and Brassica rapa cultivars , 2020, bioRxiv.

[35]  Sotirios A. Tsaftaris,et al.  Leaf Counting Without Annotations Using Adversarial Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  G. de los Campos,et al.  Genomic Selection in Plant Breeding: Methods, Models, and Perspectives. , 2017, Trends in plant science.

[37]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[38]  S. Tsaftaris,et al.  Phenotiki: an open software and hardware platform for affordable and easy image‐based phenotyping of rosette‐shaped plants , 2017, The Plant journal : for cell and molecular biology.

[39]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[40]  Mohammadreza Soltaninejad,et al.  Three Dimensional Root CT Segmentation Using Multi-Resolution Encoder-Decoder Networks , 2020, IEEE Transactions on Image Processing.

[41]  R. Levins The strategy of model building in population biology , 1966 .

[42]  A. Alercia,et al.  FAO/Bioversity Multi-Crop Passport Descriptors V.2.1 [MCPD V.2.1] - December 2015 , 2015 .

[43]  J. Losey,et al.  Negative effects of pesticides on wild bee communities can be buffered by landscape context , 2015, Proceedings of the Royal Society B: Biological Sciences.

[44]  Sotirios A. Tsaftaris,et al.  Leveraging multiple datasets for deep leaf counting , 2017, bioRxiv.

[45]  Tony P. Pridmore,et al.  Deep machine learning provides state-of-the-art performance in image-based plant phenotyping , 2016, bioRxiv.

[46]  Bertie Mandelblatt Black Rice: The African Origins of Rice Cultivation in the Americas , 2003 .

[47]  Hanno Scharr,et al.  Sharing the Right Data Right: A Symbiosis with Machine Learning. , 2019, Trends in plant science.

[48]  Bernhard Seeger,et al.  Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks , 2018, 1803.07892.

[49]  S. Tsaftaris,et al.  Learning to Count Leaves in Rosette Plants , 2015 .

[50]  Fumio Okura,et al.  Training instance segmentation neural network with synthetic datasets for crop seed phenotyping , 2020, Communications Biology.

[51]  C. Rawlings,et al.  KnetMiner - Intelligent search and visualisation of connected data to explain complex traits and diseases , 2020 .

[52]  R. Jefferson,et al.  The ownership question of plant gene and genome intellectual properties , 2015, Nature Biotechnology.

[53]  Theodore Tsiligiridis,et al.  Remote sensing Big AgriData for food availability , 2018, Other Conferences.

[54]  Malia A. Gehan,et al.  Lights, camera, action: high-throughput plant phenotyping is ready for a close-up. , 2015, Current opinion in plant biology.

[55]  P. Burton,et al.  Securing the Data Economy: Translating Privacy and Enacting Security in the Development of DataSHIELD , 2012, Public Health Genomics.

[56]  Philip Benfey,et al.  Plant science decadal vision 2020–2030: Reimagining the potential of plants for a healthy and sustainable future , 2020, Plant direct.

[57]  S Leonelli Incentives and Rewards to Engage in Open Science Activities , 2017 .

[58]  Sotirios A. Tsaftaris,et al.  Understanding Deep Neural Networks for Regression in Leaf Counting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[59]  P. Soltis Digitization of herbaria enables novel research. , 2017, American journal of botany.

[60]  Kevin Williams,et al.  Total FLC transcript dynamics from divergent paralogue expression explains flowering diversity in Brassica napus , 2020, The New phytologist.

[61]  Tony P. Pridmore,et al.  Towards infield, live plant phenotyping using a reduced-parameter CNN , 2019, Machine Vision and Applications.

[62]  Henning Hermjakob,et al.  Reproducibility in systems biology modelling , 2020, bioRxiv.

[63]  Jana Sperschneider,et al.  Machine learning in plant-pathogen interactions: empowering biological predictions from field scale to genome scale. , 2020, The New phytologist.

[64]  M. Trick,et al.  Spatio‐temporal expression dynamics differ between homologues of flowering time genes in the allopolyploid Brassica napus , 2018, The Plant journal : for cell and molecular biology.

[65]  Peter McCloskey,et al.  A Mobile-Based Deep Learning Model for Cassava Disease Diagnosis , 2019, Front. Plant Sci..

[66]  Elizabeth Arnaud,et al.  COPO: a metadata platform for brokering FAIR data in the life sciences , 2020 .

[67]  Uwe Scholz,et al.  BrAPI—an application programming interface for plant breeding applications , 2019, Bioinform..

[68]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[69]  Uwe Scholz,et al.  Enabling reusability of plant phenomic datasets with MIAPPE 1.1 , 2020, The New phytologist.

[70]  Carole Goble,et al.  The evolution of standards and data management practices in systems biology , 2015, Molecular systems biology.

[71]  Tony P. Pridmore,et al.  Deep convolutional neural networks for image-based Convolvulus sepium detection in sugar beet fields , 2020, Plant Methods.

[72]  Changying Li,et al.  Convolutional Neural Networks for Image-Based High-Throughput Plant Phenotyping: A Review , 2020, Plant phenomics.

[73]  Ryuei Nishii,et al.  Statistical and Machine Learning Approaches to Predict Gene Regulatory Networks From Transcriptome Datasets , 2018, Front. Plant Sci..

[74]  Marco Brandizi,et al.  KnetMiner: a comprehensive approach for supporting evidence‐based gene discovery and complex trait analysis across species , 2020, bioRxiv.

[75]  Marcel Salathé,et al.  Using Deep Learning for Image-Based Plant Disease Detection , 2016, Front. Plant Sci..

[76]  Kevin Williams,et al.  Root imaging showing comparisons in root distribution and ontogeny in novel Festulolium populations and closely related perennial ryegrass varieties , 2018, Food and Energy Security.

[77]  Rodger P. White,et al.  Major limitations to achieving “4 per 1000” increases in soil organic carbon stock in temperate regions: Evidence from long‐term experiments at Rothamsted Research, United Kingdom , 2018, Global change biology.

[78]  Daniel A. Jacobson,et al.  Accelerating Climate Resilient Plant Breeding by Applying Next-Generation Artificial Intelligence. , 2019, Trends in biotechnology.

[79]  Mike J. May,et al.  An introduction to the Farm‐Scale Evaluations of genetically modified herbicide‐tolerant crops , 2003 .

[80]  R. Ankeny,et al.  Repertoires: How to Transform a Project into a Research Community. , 2015, Bioscience.

[81]  Daniele C. Struppa,et al.  Agnostic Science. Towards a Philosophy of Data Analysis , 2011 .

[82]  J L Anderson,et al.  Inaugural address. , 1975, Journal of the Missouri Dental Association.

[83]  Sabina Leonelli,et al.  What difference does quantity make? On the epistemology of Big Data in biology , 2014, Big Data Soc..

[84]  Ruth McNally,et al.  Classifying, Constructing, and Identifying Life , 2013 .

[85]  S. Mccouch,et al.  When more is better: how data sharing would accelerate genomic selection of crop plants. , 2016, The New phytologist.

[86]  Alejandro Rodríguez-González,et al.  Publishing FAIR Data: An Exemplar Methodology Utilizing PHI-Base , 2016, Front. Plant Sci..

[87]  Elizabeth Arnaud,et al.  Applying FAIR Principles to Plant Phenotypic Data Management in GnpIS , 2019, Plant phenomics.

[88]  Konstantinos Fysarakis,et al.  Insect Biometrics: Optoacoustic Signal Processing and Its Applications to Remote Monitoring of McPhail Type Traps , 2015, PloS one.

[89]  Tony Scott,et al.  The electronic Rothamsted Archive (e-RA), an online resource for data from the Rothamsted long-term experiments , 2018, Scientific Data.

[90]  R. Ankeny,et al.  Re-thinking organisms: The impact of databases on model organism biology. , 2012, Studies in history and philosophy of biological and biomedical sciences.

[91]  J. Araus,et al.  Field high-throughput phenotyping: the new crop breeding frontier. , 2014, Trends in plant science.

[92]  R. Morris,et al.  The oilseed rape developmental expression resource: a resource for the investigation of gene expression dynamics during the floral transition in oilseed rape , 2020, BMC Plant Biology.

[93]  Pascal Neveu,et al.  Dealing with multi‐source and multi‐scale information in plant phenomics: the ontology‐driven Phenotyping Hybrid Information System , 2018, The New phytologist.

[94]  G. Parolini The Emergence of Modern Statistics in Agricultural Science: Analysis of Variance, Experimental Design and the Reshaping of Research at Rothamsted Experimental Station, 1919–1933 , 2015, Journal of the history of biology.

[95]  S. Hell,et al.  Live‐cell RESOLFT nanoscopy of transgenic Arabidopsis thaliana , 2020, Plant direct.

[96]  Ashutosh Kumar Singh,et al.  Machine Learning for High-Throughput Stress Phenotyping in Plants. , 2016, Trends in plant science.

[97]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[98]  B. Christensen,et al.  Soil degradation and recovery – Changes in organic matter fractions and structural stability , 2020, Geoderma.

[99]  C. J. Eyles,et al.  The North Wyke Farm Platform: effect of temperate grassland farming systems on soil moisture contents, runoff and associated water quality dynamics , 2016, European journal of soil science.

[100]  M. Borodovsky,et al.  How to interpret an anonymous bacterial genome: machine learning approach to gene identification. , 1998, Genome research.

[101]  Frederik Coppens,et al.  Unlocking the potential of plant phenotyping data through integration and data-driven approaches , 2017, Current opinion in systems biology.

[102]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[103]  Cyrus C. M. Mody Biomedical Computing: Digitizing Life in the United States , 2013 .

[104]  Isabelle Carbonell,et al.  The Ethics of Big Data in Big Agriculture , 2016 .

[105]  P. Bonnet,et al.  Going deeper in the automated identification of Herbarium specimens , 2017, BMC Evolutionary Biology.

[106]  Jesse Poland,et al.  Field Book: An Open‐Source Application for Field Data Collection on Android , 2014 .