Data quality concepts and techniques applied to taxonomic databases

Data Quality Concepts and Techniques Applied to Tax onomic Databases by Eduardo Couto Dalcin The thesis investigates the application of concepts and techniques of data quality in taxonomic databases to enhance the quality of infor mation services and systems in taxonomy. Taxonomic data are arranged and introduce d in Taxonomic Data Domains in order to establish a standard and a working framewo rk to support the proposed Taxonomic Data Quality Dimensions, as a specialised application of conventional Data Quality Dimensions in the Taxonomic Data Quality Do mains. The thesis presents a discussion about improving da ta quality in taxonomic databases, considering conventional Data Cleansing techniques and applying generic data content error patterns to taxonomic data. Techniques of tax onomic error detection are explored, with special attention to scientific name spelling errors. The spelling error problem is scrutinized through s pelling error detecting techniques and algorithms. Spelling error detection algorithms are described and analysed. In order to evaluate the applicability and efficiency of differ ent spelling error detection algorithms,

[1]  F. Bisby The quiet revolution: biodiversity informatics and the internet. , 2000, Science.

[2]  Donald F. Squires,et al.  Data Processing and Museum Collections: A Problem for the Present , 1966 .

[3]  R. D. MacDonald,et al.  Electronic Data Processing Methods for Botanical Garden and Arboretum Records , 1966 .

[4]  Walter G. Berendsohn,et al.  The concept of "potential taxa" in databases , 1995 .

[5]  Barbara D. Klein Detecting errors in data: clarification of the impact of base rate expectations and incentives , 2001 .

[6]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[7]  Cyril N. Alberga,et al.  String similarity and misspellings , 1967, CACM.

[8]  Frances E. M. Cook,et al.  Economic Botany Data Collection Standard , 1995 .

[9]  F. Perring,et al.  Data-Processing for the Atlas of the British Flora , 1963 .

[10]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[11]  Ildis International Legume Database and Information Service , 2007 .

[12]  David G. Green,et al.  Databasing diversity – a distributed, public‐domain approach , 1994 .

[13]  P C Silva Machine data processing and plant taxonomy. , 1966, Science.

[14]  M. J. Dallwitz,et al.  A General System for Coding Taxonomic Descriptions , 1980 .

[15]  D G MacGregor,et al.  Electronic data processing in the storage and retrieval of dental patient file information. , 1967, Journal of dental education.

[16]  T. N. Gadd,et al.  PHOENIX: the algorithm , 1990 .

[17]  Amihai Motro,et al.  Estimating the Quality of Data in Relational Databases , 1996, IQ.

[18]  T. N. Gadd,et al.  `Fisching fore weds': phonetic retrieval of written text in information systems , 1988 .

[19]  F. A. Stafleu,et al.  Authors, Taxa, Names, and Computers@@@Authors of Plant Genera , 1966 .

[20]  Robert R. Sokal,et al.  THE PRINCIPLES AND PRACTICE OF NUMERICAL TAXONOMY , 1963 .

[21]  Felix Naumann,et al.  From Databases to Information Systems , 2001 .

[22]  Thomas Redman,et al.  The impact of poor data quality on the typical enterprise , 1998, CACM.

[23]  L. Watson,et al.  BASIC TAXONOMIC DATA: THE NEED FOR ORGANISATION OVER PRESENTATION AND ACCUMULATION , 1971 .

[24]  E Pennisi,et al.  Diversity digitized. , 2000, Science.

[25]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[26]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[27]  R. J. Pankhurst DATABASE DESIGN FOR MONOGRAPHS AND FLORAS , 1988 .

[28]  Theodore J. Crovello,et al.  PROBLEMS IN THE USE OF ELECTRONIC DATA PROCESSING IN BIOLOGICAL COLLECTIONS , 1967 .