An Efficient Metric Combinatorial Algorithm for Fitting Additive Trees.

A new combinatorial algorithm for fitting additive trees to proximity data is described. This algorithm, termed the "generalized triples" or GT method, proceeds by examining all triples of objects x,y,u in relation to the remaining set of objects to be clustered. For a given focal object, say x, the algorithm determines whether y or u is x's nearest neighbor using estimates derived from the distances of these objects to each other and the saved sums of distances of these objects to the remaining objects in the set. The result is a basic computational loop that is approximately order(n(3)). This idea is applied in a sequential agglomerative algorithm, with all pairs of objects that are mutual nearest neighbors (based on the above estimates) being joined at each stage. A simple version of the algorithm can be proven to find the correct solution if the dissimilarities matrix D actually satisfies the additive tree metric. The algorithm also works well on errorful data (i.e. data that cannot be modeled perfectly by an additive tree). A simulation study demonstrates that the GT algorithm works as effectively as the Sattath and Tversky algorithm (Corter, 1982; Sattath & Tversky, 1977) in terms of fit of the obtained solutions, and is faster for moderate- to large-sized data sets, especially in the presence of error. A second simulation study shows that the GT algorithm obtains comparable fits to De Soete's ( 1983) algorithm, with large savings in computation time.

[1]  G. Soete A least squares algorithm for fitting additive trees to proximity data , 1983 .

[2]  Hervé Abdi,et al.  Additive-Tree Representations , 1990 .

[3]  L. Hubert,et al.  Combinatorial Data Analysis , 1992 .

[4]  L. Hubert,et al.  Iterative projection strategies for the least-squares fitting of tree structures to proximity data , 1995 .

[5]  H. Abdi,et al.  Tree Representations of Associative Structures in Semantic and Episodic Memory Research , 1984 .

[6]  A. Tversky,et al.  Representations of perceptions of risks , 1984 .

[7]  J. Cunningham,et al.  Free trees and bidirectional trees as representations of psychological distance , 1978 .

[8]  B Tversky,et al.  Descriptions and depictions of environments , 1992, Memory & cognition.

[9]  G. Soete Additive-tree representations of incomplete dissimilarity data , 1984 .

[10]  James G. Orter ADDTREE/P: A PASCAL program for fitting additive trees based on Sattath and Tversky’s ADDTREE algorithm , 1982 .

[11]  Wayne S. DeSarbo,et al.  Chapter 5 Non-spatial tree models for the assessment of competitive market structure: An integrated review of the marketing and psychometric literature , 1993, Marketing.

[12]  A. Tversky,et al.  Additive similarity trees , 1977 .

[13]  A. Dobson Unrooted trees for numerical taxonomy , 1974, Journal of Applied Probability.

[14]  Geert De Soete,et al.  Tree and other network models for representing proximity data , 1996 .

[15]  George W. Furnas,et al.  Metric family portraits , 1989 .

[16]  S. Hakimi,et al.  The distance matrix of a graph and its tree realization , 1972 .

[17]  A. Tversky,et al.  Spatial versus tree representations of proximity data , 1982 .