Linking Data According to Their Degree of Representativeness (DoR)

This contribution addresses the problem of extracting some representative data from complex datasets and connecting them in a directed graph. First we define a degree of representativeness (DoR) inspired of the Borda voting procedure. Secondly we present a method to connect pairwise data using neighborhoods and the DoR as an objective function. We then present case studies as illustrative purposes: unsupervised grouping of binary images, analysis of co-authorships in a research team and structuration of a medical patient-oriented database

[1]  Panos M. Pardalos,et al.  Handbook of Massive Data Sets , 2002, Massive Computing.

[2]  Yannis Manolopoulos,et al.  SkyGraph: an algorithm for important subgraph discovery in relational graphs , 2008, Data Mining and Knowledge Discovery.

[3]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[4]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[5]  G. E. Thomas Use of the bootstrap in robust estimation of location , 2000 .

[6]  David D. Jensen,et al.  Exploiting relational structure to understand publication patterns in high-energy physics , 2003, SKDD.

[7]  Olivier Buffet,et al.  Towards Preference Relations in Recommender Systems , 2010 .

[8]  R. Iman,et al.  Rank Transformations as a Bridge between Parametric and Nonparametric Statistics , 1981 .

[9]  Alexander Tuzhilin Customer relationship management and Web mining: the next frontier , 2012, Data Mining and Knowledge Discovery.

[10]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[11]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[13]  Mihai Lazarescu,et al.  Connectivity Based Stream Clustering Using Localised Density Exemplars , 2008, PAKDD.

[14]  Jennifer Neville,et al.  Probabilistic Paths and Centrality in Time , 2010 .

[15]  Micah Adler,et al.  Clustering Relational Data Using Attribute and Link Information , 2003 .

[16]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .

[17]  Lambert Schomaker,et al.  Variants of the Borda count method for combining ranked classifier hypotheses , 2000 .

[18]  M. Herbin,et al.  Data Representativeness Based on Fuzzy Set Theory , 2010 .

[19]  V. Barnett The Ordering of Multivariate Data , 1976 .

[20]  Rayner Alfred Summarizing relational data using semi-supervised genetic algorithm-based clustering techniques , 2010 .