Phenetic Clustering in Biology: A Critique

Phenetic clustering, the forming of hierarchical nonoverlapping groups strictly according to degree of similarity, has serious shortcoming as it is commonly used in biology. When used as a method for estimating phylogeny, phenetic clustering rests on a questionable assumption of correspodence between similarity and recency of common ancestry. This compromises its ability to reconstruct the correct branching sequence when rates of evolutionary divergence are unequal among lineages, as well as causing it to obscure rate differences even when the branching sequence is reconstructed correctly. When used as a method for analysing patterns of geographic variation and genetic continuity among populations, phenetic clustering rests on a questionable assumption of correspondence between similarity and degree of genetic continuity. This compromises its ability to identify genetically continuous units when their component populations are differentiated, and combined with its sensivity to uneven geographic sampling, it can cause the method to yield misleading results if sampling patterns are not taken into consideration. Finally, even when used simply as a method for analysing patterns of similarity without regard to causal processes, phenetic clustering rests on a questionable assumption of nested hierarchical structure. This compromises its ability to represent similarity relationships accurately when those relationships exhibit a significant nonhierarchical component. For all of the common biological applications of phenetic clustering, there exist alternative analytical methods that do not suffer from the problems associated with phenetic clustering. The problems in question result not from the phenetic (similarity) data themselves, which often can be analysed in more appropriate ways, but from the phenetic clustering procedure. At least some of the limitations of phenetic clustering as well as the advantages of alternative methods have been known for many years. Advocacy of phenetic clustering at the expense of more appropriate methods can be explained as the result of constraints imposed by an implicit assumption of nested hierarchies that was part of the taxonomic context within which the methods were developed.

[1]  David Jones,et al.  Size and shape , 1996, Nature.

[2]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[3]  J. Huelsenbeck Performance of Phylogenetic Methods in Simulation , 1995 .

[4]  J. Huelsenbeck,et al.  SUCCESS OF PHYLOGENETIC METHODS IN THE FOUR-TAXON CASE , 1993 .

[5]  R. Sokal,et al.  THE ACCURACY OF PHYLOGENETIC ESTIMATION USING THE NEIGHBOR‐JOINING METHOD , 1993, Evolution; international journal of organic evolution.

[6]  Bryan K. Epperson,et al.  Recent advances in correlation studies of spatial patterns of genetic variation , 1993 .

[7]  Wen-Hsiung Li,et al.  So, what about the molecular clock hypothesis? , 1993, Current opinion in genetics & development.

[8]  R. Debry,et al.  The consistency of several phylogeny-inference methods under varying evolutionary rates. , 1992, Molecular biology and evolution.

[9]  K. Queiroz Phylogenetic Relationships and Rates of Allozyme Evolution among the Lineages of Sceloporine Sand Lizards , 1992 .

[10]  D Penny,et al.  Progress with methods for constructing evolutionary trees. , 1992, Trends in ecology & evolution.

[11]  C. R. Grontkowski Science as a process: An evolutionary account of the social and conceptual development of science , 1992 .

[12]  M. Slatkin,et al.  Spatial Autocorrelation Methods in Population Genetics , 1991, The American Naturalist.

[13]  M. Nei,et al.  Relative efficiencies of the maximum-parsimony and distance-matrix methods of phylogeny construction for restriction data. , 1991, Molecular biology and evolution.

[14]  Sociology, selection, and success: A critique of David Hull's analysis of science and systematics , 1990 .

[15]  Enrique P. Lessa,et al.  MULTIDIMENSIONAL ANALYSIS OF GEOGRAPHIC GENETIC STRUCTURE , 1990 .

[16]  F. James Rohlf,et al.  ACCURACY OF ESTIMATED PHYLOGENIES: EFFECTS OF TREE TOPOLOGY AND EVOLUTIONARY MODEL , 1990, Evolution; international journal of organic evolution.

[17]  M. Slatkin,et al.  Detecting isolation by distance using phylogenies of genes. , 1990, Genetics.

[18]  D. Hillis,et al.  Molecular systematics: context and controversies , 1990 .

[19]  John Alroy,et al.  Principles of genealogical concordance in species concepts and biological taxonomy , 1990 .

[20]  Michael D. Hendy,et al.  A Framework for the Quantitative Study of Evolutionary Trees , 1989 .

[21]  M. Slatkin,et al.  A COMPARISON OF THREE INDIRECT METHODS FOR ESTIMATING AVERAGE LEVELS OF GENE FLOW , 1989, Evolution; international journal of organic evolution.

[22]  Gareth Nelson,et al.  Reconstructing the Past: Parsimony, Evolution, and Inference , 1989 .

[23]  Peter Godfrey-Smith,et al.  Reconstructing the Past: Parsimony, Evolution, and Inference , 1989 .

[24]  L. Maxson,et al.  Biochemical evolution in the slimy salamanders of the Plethodon glutinosus Complex in the Eastern United States , 1989 .

[25]  R. Sokal,et al.  A Classification of European Populations Based on Gene Frequencies and Cranial Measurements: A Map-Quadrat Approach , 2012, Human biology.

[26]  P. Majumder,et al.  Principal axis analysis of gene frequency data. , 1988, American journal of physical anthropology.

[27]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[28]  K. Vernon The Founding of Numerical Taxonomy , 1988, The British Journal for the History of Science.

[29]  R. O'HARA,et al.  Homage to Clio, or, Toward an Historical Philosophy for Evolutionary Biology , 1988 .

[30]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[31]  David L. Swofford,et al.  Reconstructing ancestral character states under Wagner parsimony , 1987 .

[32]  D. Swofford,et al.  Inferring Evolutionary Trees from Gene Frequency Data Under the Principle of Maximum Parsimony , 1987 .

[33]  R. Sokal,et al.  A classification of European skulls from three time periods. , 1987, American journal of physical anthropology.

[34]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[35]  M Slatkin,et al.  Gene flow and the geographic structure of natural populations. , 1987, Science.

[36]  A. Baker,et al.  RAPID GENETIC DIFFERENTIATION AND FOUNDER EFFECT IN COLONIZING POPULATIONS OF COMMON MYNAS (ACRIDOTHERES TRISTIS) , 1987, Evolution; international journal of organic evolution.

[37]  R. Powell,et al.  Electrophoretic Variation, Regional Differences, and Gene Flow in the Coho Salmon (Oncorhynchus kisutch) of Southern British Colombia , 1987 .

[38]  C. Krimbas,et al.  Accuracy of phylogenetic trees estimated from DNA sequence data. , 1987, Molecular biology and evolution.

[39]  J A Lake,et al.  A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. , 1987, Molecular biology and evolution.

[40]  M. Nei Molecular Evolutionary Genetics , 1987 .

[41]  N. Saito The neighbor-joining method : A new method for reconstructing phylogenetic trees , 1987 .

[42]  H. Ochman,et al.  Molecular time scale for evolution , 1987 .

[43]  Peter Ax,et al.  The phylogenetic system : the systematization of organisms on the basis of their phylogenesis , 1987 .

[44]  J. Neigel,et al.  Intraspecific Phylogeography: The Mitochondrial DNA Bridge Between Population Genetics and Systematics , 1987 .

[45]  Alberto Piazza,et al.  Simulation and Separation by Principal Components of Multiple Demic Expansions in Europe , 1986, The American Naturalist.

[46]  M. Slatkin,et al.  A Quasi-equilibrium theory of the distribution of rare alleles in a subdivided population , 1986, Heredity.

[47]  Robert R. Sokal,et al.  PHENETIC TAXONOMY: Theory and Methods , 1986 .

[48]  R. Sokal The Continuing Search for Order , 1985, The American Naturalist.

[49]  Montgomery Slatkin,et al.  Gene Flow in Natural Populations , 1985 .

[50]  R. Sokal,et al.  FACTORS DETERMINING THE ACCURACY OF CLADOGRAM ESTIMATION: EVALUATION USING COMPUTER SIMULATION , 1985, Evolution; international journal of organic evolution.

[51]  M. Slatkin RARE ALLELES AS INDICATORS OF GENE FLOW , 1985, Evolution; international journal of organic evolution.

[52]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[53]  D. Wake,et al.  Measuring gene flow among populations having high levels of genetic fragmentation. , 1984, Genetics.

[54]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[55]  Robert R. Sokal,et al.  A Phylogenetic Analysis of the Caminalcules. III. Fossils and Classification , 1983 .

[56]  R R Sokal,et al.  A Test of Spatial Autocorrelation Analysis Using an Isolation-by-Distance Model. , 1983, Genetics.

[57]  Robert R. Sokal,et al.  A Phylogenetic Analysis of the Caminalcules. II. Estimating the True Cladogram , 1983 .

[58]  Robert R. Sokal,et al.  A Phylogenetic Analysis of the Caminalcules. I. the Data Base , 1983 .

[59]  N. Barton,et al.  Rare electrophoretic variants in a hybrid zone , 1983, Heredity.

[60]  H. Shaffer Biosystematics of Ambystoma rosaceum and A. tigrinum in Northwestern Mexico , 1983 .

[61]  Roger S. Thorpe,et al.  A Review of the Numerical Methods for Recognising and Analysing Racial Differentiation , 1983 .

[62]  R. Sokal Analyzing Character Variation in Geographic Space , 1983 .

[63]  Gene Hart,et al.  The Occurrence of Multiple UPGMA Phenograms , 1983 .

[64]  G. Dunn,et al.  An Introduction to Mathematical Taxonomy , 1983 .

[65]  J. Felsenstein Numerical Methods for Inferring Evolutionary Trees , 1982, The Quarterly Review of Biology.

[66]  J. Felsenstein,et al.  How can we infer geography and history from gene frequencies? , 1982, Journal of theoretical biology.

[67]  M. Slatkin Estimating levels of gene flow in natural populations. , 1981, Genetics.

[68]  E. Wiley Phylogenetics: The Theory and Practice of Phylogenetic Systematics , 1981 .

[69]  Niles Eldredge,et al.  Phylogenetic Patterns and the Evolutionary Process. , 1981 .

[70]  W. Li,et al.  Simple method for constructing phylogenetic trees from distance matrices. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[71]  Robert R. Sokal,et al.  Testing Statistical Significance of Geographic Variation Patterns , 1979 .

[72]  Joseph Felsenstein,et al.  Alternative Methods of Phylogenetic Inference and their Interrelationship , 1979 .

[73]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[74]  P. Menozzi,et al.  Synthetic maps of human gene frequencies in Europeans. , 1978, Science.

[75]  R. Sokal,et al.  Spatial autocorrelation in biology: 1. Methodology , 1978 .

[76]  Robert R. Sokal,et al.  Spatial autocorrelation in biology: 2. Some biological implications and four applications of evolutionary and ecological interest , 1978 .

[77]  S. Wright Evolution and the Genetics of Populations, Volume 3: Experimental Results and Evolutionary Deductions , 1977 .

[78]  G. F. Estabrook,et al.  An algebraic analysis of cladistic characters , 1976, Discret. Math..

[79]  J. Farris,et al.  An Introduction to Numerical Classification , 1976 .

[80]  F. McMorris,et al.  A Mathematical Foundation for the Analysis of Cladistic Character Compatibility , 1976 .

[81]  J. Hartigan Clustering Algorithms , 1975 .

[82]  W. J. Quesne The Uniquely Evolved Character Concept and its Cladistic Application , 1974 .

[83]  Joseph Felsenstein,et al.  Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters , 1973 .

[84]  J. Farris Estimating Phylogenetic Trees from Distance Matrices , 1972, The American Naturalist.

[85]  Walter J. Lequesne Further Studies Based on the Uniquely Derived Character Concept , 1972 .

[86]  J. Farris The Hypothesis of Nonspecificity and Taxonomic Congruence , 1971 .

[87]  G. Moore A mathematical model for the construction of cladograms , 1971 .

[88]  L. Orgel,et al.  Biochemical Evolution , 1971, Nature.

[89]  D. H. Colless The Phenogram as an Estimate of Phylogeny , 1970 .

[90]  Arnold G. Kluge,et al.  A Numerical Approach to Phylogenetic Systematics , 1970 .

[91]  J. Farris Methods for Computing Wagner Trees , 1970 .

[92]  F. Rohlf Adaptive Hierarchical Clustering Schemes , 1970 .

[93]  A. J. Cole,et al.  An Improved Algorithm for the Jardine-Sibson Method of Generating Overlapping Clusters , 1970, Computer/law journal.

[94]  C. J. Jardine,et al.  Evolutionary Rates and the Inference of Evolutionary Tree Forms , 1969, Nature.

[95]  R. Sokal,et al.  A New Statistical Approach to Geographic Variation Analysis , 1969 .

[96]  J. Kirsch Serological data and phylogenetic inference: th problem of rates of change. , 1969, Systematic zoology.

[97]  J. Farris On the Cophenetic Correlation Coefficient , 1969 .

[98]  W. J. Quesne,et al.  A Method of Selection of Characters in Numerical Taxonomy , 1969 .

[99]  N. Jardine A LOGICAL BASIS FOR BIOLOGICAL CLASSIFICATION , 1969 .

[100]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[101]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[102]  F. Rohlf Stereograms In Numerical Taxonomy , 1968 .

[103]  Robin Sibson,et al.  The Construction of Hierarchic and Non-Hierarchic Classifications , 1968, Comput. J..

[104]  J. Hartigan REPRESENTATION OF SIMILARITY MATRICES BY TREES , 1967 .

[105]  Vincent M. Sarich,et al.  Immunological Time Scale for Hominid Evolution , 1967, Science.

[106]  A. Wilson,et al.  Rates of albumin evolution in primates. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[107]  F. Rohlf Correlated Characters in Numerical Taxonomy , 1967 .

[108]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[109]  Louis L. McQuitty,et al.  A Mutual Development of Some Typological Theories and Pattern-Analytic Methods , 1967 .

[110]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[111]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[112]  B. Wallace On the Dispersal of Drosophila , 1966, The American Naturalist.

[113]  R. Shepard Metric structures in ordinal data , 1966 .

[114]  Louis L. McQuitty A Conjunction of Rank Order Typal Analysis and Item Selection , 1965 .

[115]  R. Sokal,et al.  A METHOD FOR DEDUCING BRANCHING SEQUENCES IN PHYLOGENY , 1965 .

[116]  E. Mayr Numerical Phenetics and Taxonomic Theory , 1965 .

[117]  E. J. Dupraw Non-Linnean Taxonomy and the Systematics of Honeybees , 1965 .

[118]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[119]  E. J. Dupraw Non-Linnean Taxonomy , 1964, Nature.

[120]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[121]  J. Crow,et al.  Evidence for the Partial Dominance of Recessive Lethal Genes in Natural Populations of Drosophila , 1964, The American Naturalist.

[122]  C. Michener Some Future Developments in Taxonomy , 1963 .

[123]  Robert R. Sokal,et al.  THE PRINCIPLES AND PRACTICE OF NUMERICAL TAXONOMY , 1963 .

[124]  L. Mcquitty Rank Order Typal Analysis , 1963 .

[125]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[126]  F. James Rohlf,et al.  The Description of Taxonomic Relationships by Factor Analysis , 1962 .

[127]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[128]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[129]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[130]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[131]  R. Prim Shortest connection networks and some generalizations , 1957 .

[132]  P. Sneath The application of computers to taxonomy. , 1957, Journal of general microbiology.

[133]  P. Sneath,et al.  Some thoughts on bacterial classification. , 1957, Journal of general microbiology.

[134]  R. Sokal,et al.  A QUANTITATIVE APPROACH TO A PROBLEM IN CLASSIFICATION† , 1957, Evolution; International Journal of Organic Evolution.

[135]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[136]  S WRIGHT,et al.  Genetical Structure of Populations , 1950, British medical journal.

[137]  S. Wright,et al.  Isolation by distance under diverse systems of mating. , 1946, Genetics.

[138]  S. Wright,et al.  Isolation by Distance. , 1943, Genetics.

[139]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[140]  Wendy Applequist,et al.  THE MISSOURI BOTANICAL GARDEN. , 1903, Science.