Computer aided identification of biological specimens using self-organizing maps

For scientific or socio-economic reasons it is often necessary or desirable that biological material be identified. Given that there are an estimated 10 million living organisms on Earth, the identification of biological material can be problematic. Consequently the services of taxonomist specialists are often required. However, if such expertise is not readily available it is necessary to attempt an identification using an alternative method. Some of these alternative methods are unsatisfactory or can lead to a wrong identification. One of the most common problems encountered when identifying specimens is that important diagnostic features are often not easily observed, or may even be completely absent. A number of techniques can be used to try to overcome this problem, one of which, the Self Organizing Map (or SOM), is a particularly appealing technique because of its ability to handle missing data. This thesis explores the use of SOMs as a technique for the identification of indigenous trees of the Acacia species in KwaZulu-Natal, South Africa. The ability of the SOM technique to perform exploratory data analysis through data clustering is utilized and assessed, as is its usefulness for visualizing the results of the analysis of numerical, multivariate botanical data sets. The SOM’s ability to investigate, discover and III interpret relationships within these data sets is examined, and the technique’s ability to identify tree species successfully is tested. These data sets are also tested using the C5 and CN2 classification techniques. Results from both these techniques are compared with the results obtained by using a SOM commercial package. These results indicate that the application of the SOM to the problem of biological identification could provide the start of the long-awaited breakthrough in computerized identification that biologists have eagerly been seeking.

[1]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[2]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[3]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[4]  A. Engelbrecht,et al.  Searching the forest: using decision trees as building blocks for evolutionary search in classification databases , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[5]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[6]  Jorma Laaksonen,et al.  Variants of self-organizing maps , 1990, International 1989 Joint Conference on Neural Networks.

[7]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[8]  Bruce G. Buchanan,et al.  The MYCIN Experiments of the Stanford Heuristic Programming Project , 1985 .

[9]  S. G. Axline,et al.  An artificial intelligence program to advise physicians regarding antimicrobial therapy. , 1973, Computers and biomedical research, an international journal.

[10]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[11]  Kevin Warwick,et al.  Artificial Keys for Botanical Identification using a Multilayer Perceptron Neural Network (MLP) , 2004, Artificial Intelligence Review.

[12]  Jarkko Venna,et al.  Analysis and visualization of gene expression data using Self-Organizing Maps , 2002, Neural Networks.

[13]  Randy Goebel,et al.  Computational intelligence - a logical approach , 1998 .

[14]  M. Shaw,et al.  Induction of fuzzy decision trees , 1995 .

[15]  Juha Vesanto,et al.  Data exploration process based on the self-organizing map , 2002 .

[16]  B. Maslin,et al.  Nomenclatural and classification history of Acacia (Leguminosae: Mimosoideae), and the implications of generic subdivision , 2003 .

[17]  Konrad Lang,et al.  Evaluation of automatic knowledge acquisition techniques in the diagnosis of acute abdominal pain - Acute Abdominal Pain Study Group , 1996, Artif. Intell. Medicine.

[18]  AN APPLICATION OF EXPERT SYSTEMS TECHNOLOGY TO BIOLOGICAL IDENTIFICATION , 1987 .

[19]  Holly E. Rushmeier,et al.  A Scalable Parallel Algorithm for Self-Organizing Maps with Applications to Sparse Data Mining Problems , 1999, Data Mining and Knowledge Discovery.

[20]  Kuo-Chen Chou,et al.  Mining Biological Data Using Self-Organizing Map , 2003, J. Chem. Inf. Comput. Sci..

[21]  J. A. Venter,et al.  Making the most of indigenous trees , 1996 .

[22]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[23]  W. Vach,et al.  On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. , 2000, Statistics in medicine.

[24]  W. J. Walley,et al.  Unsupervised pattern recognition for the interpretation of ecological data , 2001 .

[25]  Vili Podgorelec,et al.  Knowledge discovery with classification rules in a cardiovascular dataset , 2005, Comput. Methods Programs Biomed..

[26]  M. O'Neill,et al.  Automated species identification: why not? , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[27]  E.H. Shortliffe,et al.  Knowledge engineering for medical decision making: A review of computer-based clinical decision aids , 1979, Proceedings of the IEEE.

[28]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[29]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[30]  Robert J. Schalkoff,et al.  Artificial neural networks , 1997 .

[31]  A. Schuster,et al.  Tumor classification by gene expression profiling: comparison and validation of five clustering methods , 2001, SIGB.

[32]  Andries P. Engelbrecht,et al.  A Building Block Approach to Genetic Programming for Rule Discovery , 2002 .

[33]  Mark A. O'Neill,et al.  Automated identification of live moths (Macrolepidoptera) using digital automated identification System (DAISY) , 2004 .

[34]  Sheng-Tun Li,et al.  A web-aware interoperable data mining system , 2002, Expert Syst. Appl..

[35]  T. Kohonen,et al.  Visual Explorations in Finance with Self-Organizing Maps , 1998 .

[36]  A. Ultsch Maps for the Visualization of high-dimensional Data Spaces , 2003 .

[37]  Alfred Ultsch The Integration of Neural Networks with Symbolic Knowledge Processing , 1994 .

[38]  Sampsa Laine,et al.  Visualization of particle size and shape distributions using self-organizing maps , 2002 .

[39]  C. Cicero,et al.  Open access, freely available online Correspondence DNA Barcoding: Promise and Pitfalls , 2022 .

[40]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[41]  William J. Clancey,et al.  From Guidon to Neomycin and Heracles in Twenty Short Lessons: ORN Final Report 1979-1985 , 1986, AI Mag..

[42]  Pierre Demartines,et al.  Data Analysis: How to Compare Kohonen Neural Networks to Other Techniques? , 1991, IWANN.

[43]  Herbert A. Simon,et al.  Applications of machine learning and rule induction , 1995, CACM.

[44]  Kimmo Kiviluoto,et al.  Topology preservation in self-organizing maps , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[45]  N. Smit Guide to the acacias of South Africa. , 1999 .

[46]  Edward H. Shortliffe,et al.  Computer-based medical consultations, MYCIN , 1976 .

[47]  S. G. Axline,et al.  Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system. , 1975, Computers and biomedical research, an international journal.

[48]  Sovan Lek,et al.  Utilisation of non-supervised neural networks and principal component analysis to study fish assemblages , 2001 .

[49]  Kevin Warwick,et al.  The plastic self organising map , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[50]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory, Third Edition , 1989, Springer Series in Information Sciences.

[51]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[52]  Peter H. Schalk,et al.  Interactive knowledge systems: meeting the demand for disseminating upt to date biological information , 1996 .

[53]  Andries P. Engelbrecht,et al.  Computational Intelligence: An Introduction , 2002 .

[54]  Andreas Rauber,et al.  The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data , 2002, IEEE Trans. Neural Networks.

[55]  Jeremy R. deWaard,et al.  Biological identifications through DNA barcodes , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[56]  Pedro Miramontes,et al.  Biological Domain Identification Based in Codon Usage by Means of Rule and Tree Induction , 2004, CMSB.

[57]  Pasi Koikkalainen,et al.  Progress with the Tree-Structured Self-Organizing Map , 1994, ECAI.

[58]  Tarun Khanna,et al.  Foundations of neural networks , 1990 .

[59]  R. Brummitt,et al.  World geographical scheme for recording plant distributions , 1992 .

[60]  Christoph F. Eick,et al.  Fast Decision Tree Learning Algorithms for Microarray Data Collections , 2003, ICMLA.

[61]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[62]  David R. Gilbert,et al.  An Empirical Comparison of Supervised Machine Learning Techniques in Bioinformatics , 2003, APBC.

[63]  Anthony C. Stylianou,et al.  Expert support systems: integrating AI technologies , 1993, CACM.

[64]  M. Stoeckle Taxonomy, DNA, and the Bar Code of Life , 2003 .

[65]  Mounia Lalmas,et al.  Using Dempster-Shafer's Theory of Evidence to Combine Aspects of Information Use , 2004, Journal of Intelligent Information Systems.

[66]  B. Maslin,et al.  (1584) Proposal to conserve the name Acacia (Leguminosae: Mimosoideae) with a conserved type , 2003 .

[67]  Richard John Pankhurst Biological identification: The principles and practice of identification methods in biology , 1978 .

[68]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[70]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[71]  Saman K. Halgamuge,et al.  A self-growing cluster development approach to data mining , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[72]  Kwong-Sak Leung,et al.  Fuzzy concepts in expert systems , 1988, Computer.

[73]  Fionn Murtagh,et al.  Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering , 1995, Pattern Recognit. Lett..

[74]  William J. Clancey,et al.  NEOMYCIN: Reconfiguring a Rule-Based Expert System for Application to Teaching , 1981, IJCAI.

[75]  Gerrit van Straten,et al.  A Neuro-Fuzzy Approach to Identify Lettuce Growth and Greenhouse Climate , 2004, Artificial Intelligence Review.

[76]  K. Will,et al.  Myth of the molecule: DNA barcodes for species cannot replace morphology for identification and classification , 2004, Cladistics : the international journal of the Willi Hennig Society.

[77]  Paulo J. G. Lisboa,et al.  A review of evidence of health benefit from artificial neural networks in medical intervention , 2002, Neural Networks.

[78]  Eva Lucrecia Gibaja Galindo,et al.  G.R.E.E.N. - An Expert System to Identify Gymnosperms , 2004, ICEIS.

[79]  Edward A. Felgenbaum The art of artificial intelligence: themes and case studies of knowledge engineering , 1977, IJCAI 1977.

[80]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1993 .

[81]  Pasi Koikkalainen,et al.  Self-organizing hierarchical feature maps , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[82]  Edwin S. Peer,et al.  A Serendipitous Software Framework for Facilitating Collaboration in Computational Intelligence , 2004 .

[83]  Bruce G. Buchanan,et al.  DENDRAL and Meta-DENDRAL: Roots of Knowledge Systems and Expert System Applications , 1993, Artif. Intell..

[84]  C. Woese,et al.  Phylogenetic structure of the prokaryotic domain: The primary kingdoms , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[85]  M. J. Dallwitz,et al.  A General System for Coding Taxonomic Descriptions , 1980 .

[86]  T. A. Paine,et al.  User's guide to the Delta system: a general system for processing taxonomic descriptions , 1993 .

[87]  Ajit Narayanan,et al.  Mining viral protease data to extract cleavage knowledge , 2002, ISMB.

[88]  Douglas B. Kell,et al.  Plant seed classification using pyrolysis mass spectrometry with unsupervised learning: The application of auto-associative and Kohonen artificial neural networks , 1996 .

[89]  Saman K. Halgamuge,et al.  Knowledge Discovery With Supervised and Unsupervised Self Evolving Neural Networks , 1998 .

[90]  Pasi Koikkalainen,et al.  Tree Structured Self-Organizing Maps , 1999 .

[91]  T. A. Paine,et al.  Delta user's guide: a general system for processing taxonomic descriptions. , 1993 .

[92]  D. Janzen,et al.  Use of DNA barcodes to identify flowering plants. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[93]  Guido Deboeck Software Tools for Self-Organizing Maps , 1998 .

[94]  Tommy W. S. Chow,et al.  An online cellular probabilistic self-organizing map for static and dynamic data sets , 2004, IEEE Transactions on Circuits and Systems I: Regular Papers.

[95]  Peter Clark,et al.  Induction in Noisy Domains , 1987, EWSL.

[96]  Ted Shortliffe,et al.  Some considerations for the implementation of knowledge-based expert systems , 1975, SGAR.

[97]  A. C. Scott,et al.  Evaluating the performance of a computer-based consultant. , 1979, Computer programs in biomedicine.

[98]  Teuvo Kohonen,et al.  Software Tools for SOM , 2001 .

[99]  Joshua Lederberg,et al.  DENDRAL: A Case Study of the First Expert System for Scientific Hypothesis Formation , 1993, Artif. Intell..

[100]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[101]  Andreas Rauber,et al.  Uncovering hierarchical structure in data using the growing hierarchical self-organizing map , 2002, Neurocomputing.

[102]  Joshua Lederberg How DENDRAL was conceived and born , 1990 .

[103]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[104]  T. Kohonen,et al.  Bibliography of Self-Organizing Map SOM) Papers: 1998-2001 Addendum , 2003 .

[105]  Tariq Samad,et al.  Feature map learning with partial training data , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[106]  Kevin J. Gaston,et al.  Driving miss daisy: the performance of an automated insect identification system. , 2000 .

[107]  T. Kohonen,et al.  Exploratory Data Analysis by the Self-Organizing Map: Structures of Welfare and Poverty in the World , 1996 .

[108]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[109]  Peter A. Flach,et al.  Rule induction for subgroup discovery with CN2-SD , 2002 .

[110]  F. Boero Light after dark: the partnership for enhancing expertise in taxonomy. , 2001, Trends in ecology & evolution.

[111]  P. V. Wyk,et al.  Field Guide to Trees of Southern Africa , 1997 .

[112]  Edward H. Shortliffe,et al.  A model of inexact reasoning in medicine , 1990 .

[113]  Vili Podgorelec,et al.  Decision Trees: An Overview and Their Use in Medicine , 2002, Journal of Medical Systems.

[114]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[115]  O. Kandler,et al.  Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[116]  Q. Wheeler,et al.  Taxonomic triage and the poverty of phylogeny. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[117]  Lawrence M. Fagan,et al.  Computerized consultation system for selection of antimicrobial therapy. , 1976, American journal of hospital pharmacy.

[118]  Luís B. Almeida,et al.  Improving the Learning Speed in Topological Maps of Patterns , 1990 .

[119]  M. Luckow,et al.  Acacia : the case against moving the type to Australia , 2005 .

[120]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[121]  M. J. Dallwitz,et al.  A Comparison of Matrix-Based Taxonomic Identification Systems with Rule-Based Systems , 1992 .

[122]  P. Brown,et al.  DNA arrays for analysis of gene expression. , 1999, Methods in enzymology.

[123]  Nikola Kasabov,et al.  Foundations Of Neural Networks, Fuzzy Systems, And Knowledge Engineering [Books in Brief] , 1996, IEEE Transactions on Neural Networks.

[124]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[125]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[126]  George Hripcsak,et al.  Knowledge discovery and data mining to assist natural language understanding , 1998, AMIA.

[127]  Nils J. Nilsson,et al.  Artificial Intelligence: A New Synthesis , 1997 .

[128]  R. Leakey,et al.  The Sixth Extinction: Patterns of Life and the Future of Humankind , 1995 .

[129]  Luís M. Santos,et al.  IDENTIFICATION BY GABOR FEATURES , 2002 .

[130]  Alfred Ultsch,et al.  The architecture of emergent self-organizing maps to reduce projection errors , 2005, ESANN.

[131]  Jacek M. Zurada,et al.  Introduction to artificial neural systems , 1992 .

[132]  Waldo Fajardo Contreras,et al.  An application of expert systems to botanical taxonomy , 2003, Expert Syst. Appl..

[133]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[134]  David West,et al.  A comparison of SOM neural network and hierarchical clustering methods , 1996 .

[135]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[136]  N. D. Stone,et al.  Application of Artificial Intelligence to Systematics: Systex—A Prototype Expert System for Species Identification , 1987 .

[137]  Alfred Ultsch,et al.  Self-Organizing-Feature-Maps versus Statistical Clustering Methods: A Benchmark , 1994 .

[138]  Jean Paul Haton,et al.  Expert systems : principles and practice , 1988 .

[139]  Jonathan Y. Clark,et al.  Artificial neural networks for species identification by taxonomists. , 2003, Bio Systems.

[140]  K. C. Palgrave Trees of Southern Africa , 1977 .

[141]  Alfred Ultsch,et al.  Knowledge Extraction from Self-Organizing Neural Networks , 1993 .

[142]  S. Lek,et al.  Applications of artificial neural networks for patterning and predicting aquatic insect species richness in running waters , 2003 .

[143]  William W. L. Cheung,et al.  A Fuzzy Logic Expert System to Estimate Intrinsic Extinction Vulnerabilities of Marine Fishes to Fishing , 2004 .

[144]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[145]  Roman Bek,et al.  Discourse on one way in which a quantum-mechanics language on the classical logical base can be built up , 1978, Kybernetika.

[146]  Kevin J. Gaston,et al.  Image analysis, neural networks, and the taxonomic impediment to biodiversity studies , 1997, Biodiversity & Conservation.

[147]  P. Hebert,et al.  Identification of Birds through DNA Barcodes , 2004, PLoS biology.

[148]  L. Zadeh,et al.  Outline of a theory of usuality based on fuzzy logic , 1996 .

[149]  Sovan Lek,et al.  A comparison of self-organizing map algorithm and some conventional statistical methods for ecological community ordination , 2001 .

[150]  Randall Davis,et al.  A DSS for diagnosis and therapy , 1977, DATB.

[151]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[152]  M. Gerstein,et al.  The current excitement in bioinformatics-analysis of whole-genome expression data: how does it relate to protein structure and function? , 2000, Current opinion in structural biology.

[153]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[154]  M. J. Dallwitz,et al.  A Comparison of Interactive Identification Programs , 2000 .

[155]  R. Leakey,et al.  THE SIXTH EXTINCTION , 1996 .

[156]  Antony Browne,et al.  Biological data mining with neural networks: implementation and application of a flexible decision tree extraction algorithm to genomic problem domains , 2004, Neurocomputing.

[157]  Jason F. Schreer,et al.  Classification of Dive Profiles: A Comparison of Statistical Clustering Techniques and Unsupervised Artificial Neural Networks , 1998 .

[158]  E. Phillips The genera of South African flowering plants , 1926 .

[159]  P. Hebert,et al.  Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[160]  R. Brummitt Report of the committee for spermatophyta: 55. Proposal 1584 on Acacia , 2004 .

[161]  Alfred Ultsch,et al.  Knowledge Extraction from Artificial Neural Networks and Applications , 1993, Transputer-Anwender-Treffen.

[162]  A.W.G. Duller,et al.  A new approach to automated pollen analysis , 2000 .

[163]  C. Malsburg Self-organization of orientation sensitive cells in the striate cortex , 2004, Kybernetik.

[164]  O. A. Leistner,et al.  Seed plants of southern Africa: Families and genera , 2001 .

[165]  Alfred Ultsch,et al.  Integration of neural networks with knowledge-based systems , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[166]  Melody Y. Kiang,et al.  Extending the Kohonen self-organizing map networks for clustering analysis , 2002 .

[167]  Samuel Kaski,et al.  Bibliography of Self-Organizing Map (SOM) Papers: 1981-1997 , 1998 .

[168]  Bala Srinivasan,et al.  Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[169]  Edward H. Shortliffe,et al.  Mycin: A Knowledge-Based Computer Program Applied to Infectious Diseases , 1977 .

[170]  D. Tautz,et al.  A plea for DNA taxonomy , 2003 .

[171]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[172]  Lawrence M. Fagan,et al.  Antimicrobial selection by a computer. A blinded evaluation by infectious diseases experts. , 1979, JAMA.

[173]  Young-Seuk Park,et al.  Predicting the species richness of aquatic insects in streams using a limited number of environmental variables , 2003, Journal of the North American Benthological Society.

[174]  Risto Miikkulainen,et al.  Incremental grid growing: encoding high-dimensional structure into a two-dimensional feature map , 1993, IEEE International Conference on Neural Networks.