Determining the saliency of feature measurements obtained from images of sedimentary organic matter for use in its classification

The classification of sedimentary organic matter (OM) images can be improved by determining the saliency of image analysis (IA) features measured from them. Knowing the saliency of IA feature measurements means that only the most significant discriminating features need be used in the classification process. This is an important consideration for classification techniques such as artificial neural networks (ANNs), where too many features can lead to the 'curse of dimensionality'. The classification scheme adopted in this work is a hybrid of morphologically and texturally descriptive features from previous manual classification schemes. Some of these descriptive features are assigned to IA features, along with several others built into the IA software (Halcon) to ensure that a valid cross-section is available. After an image is captured and segmented, a total of 194 features are measured for each particle. To reduce this number to a more manageable magnitude, the SPSS AnswerTree Exhaustive CHAID (@g^2 automatic interaction detector) classification tree algorithm is used to establish each measurement's saliency as a classification discriminator. In the case of continuous data as used here, the F-test is used as opposed to the published algorithm. The F-test checks various statistical hypotheses about the variance of groups of IA feature measurements obtained from the particles to be classified. The aim is to reduce the number of features required to perform the classification without reducing its accuracy. In the best-case scenario, 194 inputs are reduced to 8, with a subsequent multi-layer back-propagation ANN recognition rate of 98.65%. This paper demonstrates the ability of the algorithm to reduce noise, help overcome the curse of dimensionality, and facilitate an understanding of the saliency of IA features as discriminators for sedimentary OM classification.

[1]  Anil K. Jain,et al.  CMEIAS: A Computer-Aided System for the Image Analysis of Bacterial Morphotypes in Microbial Communities , 2001, Microbial Ecology.

[2]  Andrew Francis Weller,et al.  The semi-automated classification of sedimentary organic matter and dinoflagellate cysts in palynological preparations , 2004 .

[3]  David R. Morse,et al.  Information Technology, Plant Pathology and Biodiversity , 1998 .

[4]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[5]  M. Boulter,et al.  Sedimentation of organic particles: An approach to a standard terminology for palynodebris , 1994 .

[6]  David Biggs,et al.  A method of choosing multiway partitions for classification and decision trees , 1991 .

[7]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Julian D. Olden,et al.  Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks , 2002 .

[9]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[10]  Jonathan Corcoran,et al.  The semi-automated classification of sedimentary organic matter in palynological preparations , 2005, Comput. Geosci..

[11]  B. Malmgren,et al.  Application of artificial neural networks to paleoceanographic data , 1997 .

[12]  Alfred Traverse,et al.  Sedimentation of organic particles: Frontmatter , 1994 .

[13]  Richard V. Tyson,et al.  Palynological Kerogen Classification , 1995 .

[14]  Richard V. Tyson,et al.  Sedimentary Organic Matter: Organic facies and palynofacies , 1994 .