Using Data as Observers: A New Paradigm for Prototypes Selection

The prototype selection is a bottleneck for lot of data analysis procedures. This paper proposes a new deterministic selection of prototypes based on a pairwise comparison between data. Data is ranked relative to each data. We use the paradigm of the observer situated on the data. The ranks relative to this data gives the viewpoint of the observer to the dataset. Two observers provide a link between them if they have no data between them from their respective viewpoints. The links are directed to obtain a directed graph where data is the set of vertices of the graph. The observers move using the directed graph. They reach a prototype when they arrive at a viewpoint with no outgoing connexion of the directed graph. This method proposes both the prototype selection and the structuration of the dataset through the directed graph. The paper also presents an assessment with three kinds of datasets. The method seems particularly useful when the classes are hardly distinguishable with classical clustering methods.

[1]  Loris Nanni,et al.  Prototype reduction techniques: A comparison among different approaches , 2011, Expert Syst. Appl..

[2]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[3]  Bernadette Bouchon-Meunier,et al.  Fuzzy Prototypes: From a Cognitive View to a Machine Learning Principle , 2008, Fuzzy Sets and Their Extensions: Representation, Aggregation and Models.

[4]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[5]  Yang Song,et al.  Automatic tag recommendation algorithms for social recommender systems , 2011, ACM Trans. Web.

[6]  Francisco Herrera,et al.  MRPR: A MapReduce solution for prototype reduction in big data classification , 2015, Neurocomputing.

[7]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Rainer Schmidt,et al.  Experiences with Case-Based Reasoning Methods and Prototypes for Medical Knowledge-Based Systems , 1999, AIMDM.

[9]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[10]  Padraig Cunningham,et al.  A Taxonomy of Similarity Mechanisms for Case-Based Reasoning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[11]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[12]  Rainer Schmidt,et al.  The Roles of Prototypes in Medical Case-Based Reasoning Systems , 1996 .

[13]  Thomas Reinartz,et al.  A Unifying View on Instance Selection , 2002, Data Mining and Knowledge Discovery.

[14]  L. Thurstone A law of comparative judgment. , 1994 .