A Data Mining Approach for Standardization of Collectors Names in Herbarium Database

Botanical scientific collections databases are of vital importance for the study of biodiversity. Records maintained in these databases serve several biological research and are evidence of the occurrence of species in nature. Despite the steady increase in the volume of data available in scientific collections of research institutions and their herbaria, data quality is still not ideal and requires considerable effort of researchers in these data cleaning process. This paper presents a methodology to assess, identify suspicious records and for standardization collectors names of specimens. The methodology involves the application of data mining, specifically the association rules analysis, using the Apriori algorithm. The case study performed the database Jabot of Rio de Janeiro Botanic Garden Research Institute.

[1]  Jitendra Kumar,et al.  Parallel k-Means Clustering for Quantitative Ecoregion Delineation Using Large Data Sets , 2011, ICCS.

[2]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[3]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[4]  Rafael Pino-Mejías,et al.  Predicting the potential habitat of oaks with data mining models and the R system , 2010, Environ. Model. Softw..

[5]  Vipin Kumar,et al.  Emerging scientific applications in data mining , 2002, CACM.

[6]  Shu-Hsien Liao,et al.  Data mining techniques and applications - A decade review from 2000 to 2011 , 2012, Expert Syst. Appl..

[7]  Jano Moreira de Souza,et al.  Analysis and visualization of the geographical distribution of atlantic forest bromeliads species , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[8]  Jitendra Kumar,et al.  Cluster Analysis-Based Approaches for Geospatiotemporal Data Mining of Massive Data Sets for Identification of Forest Threats , 2011, ICCS.

[9]  Ben Raymond,et al.  Data Mining and Scientific Data , 2005 .

[10]  Lisa Drew,et al.  Are We Losing the Science of Taxonomy? , 2011 .

[11]  David A. Koonce,et al.  Using data mining to find patterns in genetic algorithm solutions to a job shop schedule , 2000 .

[12]  Kurt Hornik,et al.  Mining Association Rules and Frequent Itemsets , 2015 .

[13]  L. Sack,et al.  Digital data collection in forest dynamics plots , 2010 .

[14]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[15]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[16]  M. P. Morim,et al.  Coleções botânicas: documentação da biodiversidade brasileira , 2003 .

[17]  R. Freckleton,et al.  Declines in the numbers of amateur and professional taxonomists: implications for conservation , 2002 .

[18]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[19]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[20]  Jano Moreira de Souza,et al.  Applying data mining techniques for spatial distribution analysis of plant species co-occurrences , 2016, Expert Syst. Appl..