Contribution aux méthodes de classification non supervisée via des approches prétopologiques et d'agrégation d'opinions. (Contribution to the data clustering methods via pretopological approaches and of opinions aggregation)

Le travail de these a porte sur une reflexion relative aux methodes de classification automatique des donnees pour lesquelles il est bien connu qu'un effet « methode » existe. Apres une premiere partie qui presente la problematique generale de l'analyse des donnees et propose un survey des methodes de classification, les travaux originaux de la these sont exposes. Ils relevent de trois approches interconnectees : une approche basee sur l'agregation d'opinions, une approche pretopologique et une approche basee sur l'agregation des preferences. Chacune de ces approches se fonde sur un paradigme different et propose une nouvelle vision des techniques de classification permettant d'apporter eventuellement de l'information exogene dans la methode.

[1]  Michel Lamure Espaces abstraits et reconnaissance des formes : application au traitement des images digitales , 1987 .

[2]  Gilbert Saporta,et al.  Probabilités, Analyse des données et statistique , 1991 .

[3]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[4]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[5]  James C. Bezdek,et al.  Genetic algorithm guided clustering , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[6]  C. Alippi,et al.  Cluster partitioning in image analysis classification: a genetic algorithm approach , 1992, CompEuro 1992 Proceedings Computer Systems and Software Engineering.

[7]  Donald R. Jones,et al.  Solving Partitioning Problems with Genetic Algorithms , 1991, International Conference on Genetic Algorithms.

[8]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[9]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[10]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[11]  Pedro Larrañaga,et al.  Applying genetic algorithms to search for the best hierarchical clustering of a dataset , 1999, Pattern Recognit. Lett..

[12]  Otto Opitz,et al.  Aggregation of Ordinal Judgements Based on Condorcet's Majority Rule , 2005, Data Analysis and Decision Support.

[13]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[14]  Eytan Domany,et al.  Data Clustering Using a Model Granular Magnet , 1997, Neural Computation.

[15]  C. J. Jardine,et al.  The structure and construction of taxonomic hierarchies , 1967 .

[16]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[17]  Kevin B. Korb,et al.  Causal Discovery via MML , 1996, ICML.

[18]  Jean-Pierre Barthélemy,et al.  The Median Procedure for Partitions , 1993, Partitioning Data Sets.

[19]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[20]  Richard C. Dubes,et al.  Experiments in projection and clustering by simulated annealing , 1989, Pattern Recognit..

[21]  Mathieu Vrac Analyse et modélisation de données probabilistes par décomposition de mélange de copules et application à une base de données climatologiques , 2002 .

[22]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[23]  A. D. Gordon,et al.  Classification : Methods for the Exploratory Analysis of Multivariate Data , 1981 .

[24]  Desire L. Massart,et al.  The Interpretation of Analytical Chemical Data by the Use of Cluster Analysis , 1983 .

[25]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[26]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[27]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[28]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[29]  Michel Manago Intégration de techniques numériques et symboliques en apprentissage automatique , 1988 .

[30]  T. Klastorin Assessing Cluster Analysis Results , 1983 .

[31]  E. Diday Crossings, Orders and Ultrametrics: Application to Visualization of Consensus for Comparing Classifications , 1982 .

[32]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[33]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[34]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[35]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[36]  H. Ross Principles of Numerical Taxonomy , 1964 .

[37]  Phipps Arabie,et al.  AN OVERVIEW OF COMBINATORIAL DATA ANALYSIS , 1996 .

[38]  R C Durfee,et al.  A METHOD OF CLUSTER ANALYSIS. , 1970, Multivariate behavioral research.

[39]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[40]  Manish Sarkar,et al.  A clustering algorithm using an evolutionary programming-based approach , 1997, Pattern Recognit. Lett..

[41]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[42]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[43]  Cheng-Yan Kao,et al.  Applying the genetic approach to simulated annealing in solving some NP-hard problems , 1993, IEEE Trans. Syst. Man Cybern..

[44]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[45]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[46]  Geert De Soete,et al.  A least squares algorithm for fitting an ultrametric tree to a dissimilarity matrix , 1984, Pattern Recognit. Lett..

[47]  Ralph Mason Dreger,et al.  Microcomputer Programs for the Rand Index of Cluster Similarity , 1986 .

[48]  W. T. Williams,et al.  Dissimilarity Analysis: a new Technique of Hierarchical Sub-division , 1964, Nature.

[49]  S. Arnold A Test for Clusters , 1979 .

[50]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[51]  Peter Cheeseman,et al.  Bayesian classification theory , 1991 .

[52]  P. C. Schuur,et al.  Classification of Acceptance Criteria for the Simulated Annealing Algorithm , 1997, Math. Oper. Res..

[53]  Roger J.-B. Wets,et al.  Minimization by Random Search Techniques , 1981, Math. Oper. Res..

[54]  W. Krzanowski,et al.  A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering , 1988 .

[55]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[56]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[57]  C. Ribeiro,et al.  Clustering and clique partitioning: Simulated annealing and tabu search approaches , 1992 .

[58]  Edwin Diday,et al.  Symbolic Data Clustering , 2005 .

[59]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[60]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[61]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[62]  F. Marcotorchino,et al.  Agr?gation des similarit?s en classification automatique , 1981 .

[63]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[64]  M. Chavent,et al.  Analyse de données symboliques : une méthode divisive de classification , 1997 .

[65]  Günter Rudolph,et al.  Massively Parallel Simulated Annealing and Its Relation to Evolutionary Algorithms , 1993, Evolutionary Computation.

[66]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[67]  H. Kriegel,et al.  Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support , 2000, Data Mining and Knowledge Discovery.

[68]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[69]  Aidong Zhang,et al.  WaveCluster: a wavelet-based clustering approach for spatial data in very large databases , 2000, The VLDB Journal.

[70]  Adrian E. Raftery,et al.  Inference in model-based cluster analysis , 1997, Stat. Comput..

[71]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[72]  C. B. Lucasius,et al.  On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasiblity and comparison , 1993 .

[73]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[74]  Alan Agresti,et al.  The Measurement of Classification Agreement: An Adjustment to the Rand Statistic for Chance Agreement , 1984 .

[75]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[76]  M. Fraboni,et al.  The Wais-R Number-Of-Factors Quandary:A Cluster Analytic Approach Toconstruct Validation , 1992 .

[77]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition , 1992 .

[78]  Vijay V. Raghavan,et al.  A clustering strategy based on a formalism of the reproductive process in natural systems , 1979, SIGIR '79.

[79]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[80]  Hans-Peter Kriegel,et al.  A distribution-based clustering algorithm for mining in large spatial databases , 1998, Proceedings 14th International Conference on Data Engineering.

[81]  W. Eddy,et al.  Approximate single linkage cluster analysis of large data sets in high-dimensional spaces , 1996 .

[82]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[83]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[84]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[85]  Andrew Foss,et al.  A non-parametric approach to web log analysis , 2001 .

[86]  Robert C. Kohberger,et al.  Cluster Analysis (3rd ed.) , 1994 .

[87]  M. Narasimha Murty,et al.  Clustering with evolution strategies , 1994, Pattern Recognit..

[88]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[89]  M. C. Cooper,et al.  The effect of measurement error on determining the number of clusters in clusteranalysis , 1988 .

[90]  Rita Cucchiara,et al.  Analysis and Comparison of different Genetic Models for the Clustering problem in Image Analysis , 1993 .

[91]  Francisco de A. T. de Carvalho,et al.  Clustering of interval data based on city-block distances , 2004, Pattern Recognit. Lett..

[92]  Nicolas Nicoloyannis Structures prétopologiques et classification automatique : le logiciel DEMON , 1988 .

[93]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[94]  Yoshiko Wakabayashi,et al.  A cutting plane algorithm for a clustering problem , 1989, Math. Program..

[95]  André Hardy,et al.  Une nouvelle approche des problèmes de classification automatique , 1982 .

[96]  L. Hubert Monotone invariant clustering procedures , 1973 .

[97]  Jean-Paul Rasson,et al.  The gap test: an optimal method for determining the number of natural classes in cluster analysis , 1994 .

[98]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[99]  J. Chandon,et al.  Construction de l'ultramétrique la plus proche d'une dissimilarité au sens des moindres carrés , 1980 .

[100]  F. Cailliez Analyse des données , 1984 .

[101]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[102]  J. Bezdek Numerical taxonomy with fuzzy sets , 1974 .

[103]  S. Régnier,et al.  Sur quelques aspects mathématiques des problèmes de classification automatique , 1983 .

[104]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[105]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[106]  B. Everitt Unresolved Problems in Cluster Analysis , 1979 .

[107]  Gérard Govaert,et al.  Clustering in Pattern Recognition , 1981 .

[108]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[109]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[110]  J. Hartigan Statistical theory in clustering , 1985 .

[111]  Vance Faber,et al.  Clustering and the continuous k-means algorithm , 1994 .

[112]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[113]  Daniel P. Fasulo,et al.  An Analysis of Recent Work on Clustering Algorithms , 1999 .

[114]  Géraldine Polaillon Organisation et interprétation par les treillis de Galois de données de type multivalué, intervalle ou histogramme , 1998 .

[115]  David E. Goldberg,et al.  The compact genetic algorithm , 1999, IEEE Trans. Evol. Comput..

[116]  L. A. Goodman,et al.  Social Choice and Individual Values , 1951 .

[117]  Edwin Diday,et al.  Orders and overlapping clusters by pyramids , 1987 .

[118]  James P. Crutchfield,et al.  Statistical Dynamics of the Royal Road Genetic Algorithm , 1999, Theor. Comput. Sci..

[119]  L. Infante,et al.  Hierarchical Clustering , 2020, International Encyclopedia of Statistical Science.

[120]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[121]  Ricardo Vilalta,et al.  Cluster Validation , 2009, Encyclopedia of Data Warehousing and Mining.

[122]  David B. Fogel,et al.  Evolutionary algorithms in theory and practice , 1997, Complex.

[123]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[124]  Babu O. Narayanan,et al.  On the approximability of numerical taxonomy , 1996 .

[125]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[126]  Jeffrey Heer,et al.  Identification of Web User Traffic Composition using Multi-Modal Clustering and Information Scent , 2000 .

[127]  M. Narasimha Murty,et al.  A comparison between conceptual clustering and conventional clustering , 1990, Pattern Recognit..

[128]  G. Celeux,et al.  An entropy criterion for assessing the number of clusters in a mixture model , 1996 .

[129]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[130]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[131]  Régis Girard Classification conceptuelle sur des données arborescentes et imprécises , 1997 .

[132]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[133]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[134]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[135]  Ryszard S. Michalski,et al.  Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[136]  A. D. Gordon A Review of Hierarchical Classification , 1987 .