Likelihood linkage analysis (LLA) classification method: an example treated by hand.

This paper describes a very general method of data analysis using a hierarchical classification. The data can be provided by observation, experiment or knowledge; their nature can be numerical, qualitative or logical. First, the classical view of the context of data representation, in which the algorithm of hierarchical ascendant construction of the classification tree is set, is treated in a synthetic manner. The main notion in our method is one of 'similarity'. This must be elaborated in the best way, taking into account the mathematical nature of the objects to be compared. Here we adopt a set of theoretical and combinatorial representation of the descriptive attributes, which are interpreted in terms of relations. Then we introduce a probability scale for similarity measurement by using a likelihood concept. The largest part of the paper concerns an illustrating example, moderately sized, detailing very minutely the different steps and the different calculations assumed by the method. The data structure handled with this example is the simplest possible. Then, general aspects and methodological extensions are evoked. We end by indicating the interest of the described approach in future works, in which we are involved, concerning typological organization of genetic sequences. We emphasize the 'explanation' aspect of the obtained results, with respect to a given description. For this purpose, classifications (on the object set and on the attribute set) on the one hand and machine learning techniques on the other, intervene efficiently.

[1]  L. Hubert,et al.  Combinatorial Data Analysis , 1992 .

[2]  Régis Gras,et al.  Élaboration et évaluation d'un indice d'implication pour des données binaires. I , 1981 .

[3]  F. Marcotorchino,et al.  Seriation problems: An overview , 1991 .

[4]  Israël-César Lerman,et al.  Justification et validité statistique d'une echelle [0,1] de fréquence mathématique pour une structure de proximité sur un ensemble de variables observées , 1984 .

[5]  Israël-César Lerman,et al.  Classification of concepts described by taxonomic preordonnance variables with multiple choice. Application to the structuration of a species set of phlebotomine , 1989 .

[6]  Structure maximale pour la somme des carrés d'une contingence aux marges fixées ; une solution algorithmique programmée , 1988 .

[7]  F. Roush Les arbres et les representations des proximites : J.-P. Barthelemy and A. Guenoche, Paris: Masson, 1988, 236 pages, 160 francs. , 1989 .

[8]  J. Y. Lafaye,et al.  Une méthode de discrétisation de variables continues , 1979 .

[9]  A. Tarski MATHEMATICSContributions to the Theory of Models. I , 1954 .

[10]  Rôle de l'inférence statistique dans une approche de l'analyse classificatoire des données , 1986 .

[11]  Pat Langley,et al.  Approaches to Conceptual Clustering , 1985, IJCAI.

[12]  I. Lerman Indices d'association partielle entre variables «Qualitatives nominales» , 1983 .

[13]  I. C. Lerman,et al.  Foundations of the likelihood linkage analysis (LLA) classification method , 1991 .

[14]  Israël-César Lerman,et al.  Conception et analyse de la forme limite d'une famille de coefficients statistiques d'association entre variables relationnelles. II , 1992 .

[15]  F. Daude Analyse et justification de la notion de ressemblance entre variables qualitatives dans l'optique de la classification hierarchique par a. V. L , 1992 .