NoSQL data model for semi-automatic integration of ethnomedicinal plant data from multiple sources.

INTRODUCTION Sharing traditional knowledge with the scientific community could refine scientific approaches to phytochemical investigation and conservation of ethnomedicinal plants. As such, integration of traditional knowledge with scientific data using a single platform for sharing is greatly needed. However, ethnomedicinal data are available in heterogeneous formats, which depend on cultural aspects, survey methodology and focus of the study. Phytochemical and bioassay data are also available from many open sources in various standards and customised formats. OBJECTIVE To design a flexible data model that could integrate both primary and curated ethnomedicinal plant data from multiple sources. MATERIALS AND METHODS The current model is based on MongoDB, one of the Not only Structured Query Language (NoSQL) databases. Although it does not contain schema, modifications were made so that the model could incorporate both standard and customised ethnomedicinal plant data format from different sources. RESULTS The model presented can integrate both primary and secondary data related to ethnomedicinal plants. Accommodation of disparate data was accomplished by a feature of this database that supported a different set of fields for each document. It also allowed storage of similar data having different properties. CONCLUSION The model presented is scalable to a highly complex level with continuing maturation of the database, and is applicable for storing, retrieving and sharing ethnomedicinal plant data. It can also serve as a flexible alternative to a relational and normalised database.

[1]  Peter Murray-Rust,et al.  Open Bibliography for Science, Technology, and Medicine , 2011, J. Cheminformatics.

[2]  Zhenyuan Lu,et al.  The taxonomic name resolution service: an online tool for automated standardization of plant names , 2013, BMC Bioinformatics.

[3]  W. Mcclatchey Improving Quality of International Ethnobotany Research and Publications , 2006 .

[4]  Stefan Jablonski,et al.  NoSQL evaluation: A use case oriented survey , 2011, 2011 International Conference on Cloud and Service Computing.

[5]  Manabendra Dutta Choudhury,et al.  Challenges in developing medicinal plant databases for sharing ethnopharmacological knowledge. , 2012, Journal of ethnopharmacology.

[6]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[7]  Luis Cayuela,et al.  taxonstand: An r package for species names standardisation in vegetation databases , 2012 .

[8]  Lavanya Ramakrishnan,et al.  Performance evaluation of a MongoDB and hadoop platform for scientific data analysis , 2013, Science Cloud '13.

[9]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[10]  Eduard Szöcs,et al.  taxize: taxonomic search and retrieval in R , 2013, F1000Research.

[11]  Kup-Sze Choi,et al.  Alternatives to relational database: Comparison of NoSQL and XML approaches for clinical data storage , 2013, Comput. Methods Programs Biomed..

[12]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .