Divisive Monothetic Clustering for Interval and Histogram-valued Data

In this paper we propose a divisive top-down clustering method designed for interval and histogram-valued data. The method provides a hierarchy on a set of objects together with a monothetic characterization of each formed cluster. At each step, a cluster is split so as to minimize intra-cluster dispersion, which is measured using a distance suitable for the considered variable types. The criterion is minimized across the bipartitions induced by a set of binary questions. Since interval-valued variables may be considered a special case of histogram-valued variables, the method applies to data described by either kind of variables, or by variables of both types. An example illustrates the proposed approach.

[1]  Thierry Denoeux,et al.  Multidimensional scaling of interval-valued dissimilarity data , 2000, Pattern Recognit. Lett..

[2]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[3]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[4]  Lawrence Carin,et al.  Bayesian Robust Principal Component Analysis , 2011, IEEE Transactions on Image Processing.

[5]  Yves Lechevallier,et al.  Adaptive Hausdorff distances and dynamic clustering of symbolic interval data , 2006, Pattern Recognit. Lett..

[6]  Carlos Maté,et al.  Electric power demand forecasting using interval time series: A comparison between VAR and iMLP , 2010 .

[7]  Fabio Spagnolo,et al.  Contemporaneous Threshold Autoregressive Models: Estimation, Testing and Forecasting , 2006 .

[8]  F. Coolen,et al.  Interval-valued regression and classication models in the framework of machine learning , 2011 .

[9]  Ahlame Douzal-Chouakria Extension des méthodes d'analyse factorielles à des données de type intervalle , 1998 .

[10]  Ivan P. Gavrilyuk Book Review: Introduction to interval analysis , 2010 .

[11]  Edwin Diday,et al.  I-Scal: Multidimensional scaling of interval dissimilarities , 2006, Comput. Stat. Data Anal..

[12]  Jonathan H. Wright,et al.  Bayesian Model Averaging and Exchange Rate Forecasts , 2003 .

[13]  Chun-Houh Chen GENERALIZED ASSOCIATION PLOTS: INFORMATION VISUALIZATION VIA ITERATIVELY GENERATED CORRELATION MATRICES , 2002 .

[14]  Paula Brito Use of Pyramids in Symbolic Data Analysis , 1994 .

[15]  KC Gowda,et al.  Disaggregative Clustering Using the Concept of Mutual Nearest Neighborhood , 1978 .

[16]  Javier Arroyo,et al.  iMLP: Applying Multi-Layer Perceptrons to Interval-Valued Data , 2007, Neural Processing Letters.

[17]  W. T. Williams,et al.  Dissimilarity Analysis: a new Technique of Hierarchical Sub-division , 1964, Nature.

[18]  Sanjiv Sabherwal,et al.  Forecasting exchange rates: Do banks know better? , 2002 .

[19]  P. Bertrand,et al.  Descriptive Statistics for Symbolic Data , 2000 .

[20]  H. Shirato,et al.  A mathematical model of the volume effect which postulates cell migration from unirradiated tissues. , 1995, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[21]  Hans-Hermann Bock,et al.  Dynamic clustering for interval data based on L2 distance , 2006, Comput. Stat..

[22]  J. Arroyo,et al.  Forecasting histogram time series with k-nearest neighbours methods , 2009 .

[23]  Chun-Houh Chen,et al.  GAP: A graphical environment for matrix visualization and cluster analysis , 2010, Comput. Stat. Data Anal..

[24]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[25]  H. Shirato,et al.  Theoretical Comparison between Availabilities of Single- and Fractionated- Irradiation Therapies , 2011 .

[26]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[27]  Monique Noirhomme-Fraiture,et al.  Far beyond the classical data models: symbolic data analysis , 2011, Stat. Anal. Data Min..

[28]  Robert K. Leik,et al.  A Measure of Ordinal Consensus , 1966 .

[29]  Suzanne Winsberg,et al.  Multidimensional Scaling of Histogram Dissimilarities , 2006, Data Science and Classification.

[30]  Paula Brito Symbolic objects: order structure and pyramidal clustering , 1995, Ann. Oper. Res..

[31]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[32]  Mohamed A. Ismail,et al.  Fuzzy clustering for symbolic data , 1998, IEEE Trans. Fuzzy Syst..

[33]  Antonio Ciampi,et al.  Classification and Discrimination: the RECPAM Approach , 1994 .

[34]  Philippe Nivlet,et al.  Interval Discriminant Analysis: An Efficient Method to Integrate Errors In Supervised Pattern Recognition , 2001, ISIPTA.

[35]  Francisco de A. T. de Carvalho,et al.  Clustering of interval data based on city-block distances , 2004, Pattern Recognit. Lett..

[36]  Edwin Diday,et al.  In-service inspection of reinforced concrete cooling towers – EDF's feedback , 2012 .

[37]  Francisco de A. T. de Carvalho,et al.  Forecasting models for interval-valued time series , 2008, Neurocomputing.

[38]  Francisco de A. T. de Carvalho,et al.  Constrained linear regression models for symbolic interval-valued variables , 2010, Comput. Stat. Data Anal..

[39]  Monique Noirhomme-Fraiture,et al.  Symbolic Data Analysis and the SODAS Software , 2008 .

[40]  L. Billard,et al.  SYMBOLIC PRINCIPAL COMPONENTS FOR INTERVAL-VALUED OBSERVATIONS , 2009 .

[41]  G. N. Lance,et al.  Note on a New Information-Statistic Classificatory Program , 1968, Comput. J..

[42]  Francisco de A. T. de Carvalho,et al.  Unsupervised pattern recognition models for mixed feature-type symbolic data , 2010, Pattern Recognit. Lett..

[43]  A. Timmermann Forecast Combinations , 2005 .

[44]  Ana Colubi,et al.  Interval arithmetic-based simple linear regression between interval data: Discussion and sensitivity analysis on the choice of the metric , 2012, Inf. Sci..

[45]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[46]  Seppo Laaksonen The Survey as a Basis for Symbolic Data Analysis , 2010 .

[47]  Edwin Diday,et al.  A Recent Advance in Data Analysis: Clustering Objects into Classes Characterized by Conjunctive Concepts , 1981 .

[48]  Francisco de A. T. de Carvalho,et al.  Fuzzy c-means clustering methods for symbolic interval data , 2007, Pattern Recognit. Lett..

[49]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[50]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[51]  Andre Luis Santiago Maia,et al.  Holt’s exponential smoothing and neural network models for forecasting interval-valued time series , 2011 .

[52]  Yves Lechevallier,et al.  Partitional clustering algorithms for symbolic interval data based on single adaptive distances , 2009, Pattern Recognit..

[53]  Javier Montero,et al.  Consensus Measures for Symbolic Data. , 2010 .

[54]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[55]  Daniel Wegmann,et al.  Bayesian Computation and Model Selection Without Likelihoods , 2010, Genetics.

[56]  Renata M. C. R. de Souza,et al.  Logistic regression-based pattern classifiers for symbolic interval data , 2011, Pattern Analysis and Applications.

[57]  Yves Lechevallier,et al.  Clustering constrained symbolic data , 2009, Pattern Recognit. Lett..

[58]  Francisco de A. T. de Carvalho,et al.  Applying Constrained Linear Regression Models to Predict Interval-Valued Data , 2005, KI.

[59]  Mark P. Taylor,et al.  Why is it so Difficult to Beat the Random Walk Forecast of Exchange Rates? , 2001 .

[60]  Hisao Ishibuchi,et al.  DISCRIMINANT ANALYSIS OF MULTI-DIMENSIONAL INTERVAL DATA AND ITS APPLICATION TO CHEMICAL SENSING , 1990 .

[61]  Antonio Irpino,et al.  A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data , 2006, Data Science and Classification.

[62]  Carlos Maté,et al.  A Multivariate Analysis Approach to Forecasts Combination. Application to Foreign Exchange (FX) Markets , 2011 .

[63]  J. M. Bates,et al.  The Combination of Forecasts , 1969 .

[64]  Paula Brito,et al.  Linear discriminant analysis for interval data , 2006, Comput. Stat..

[65]  M. King,et al.  The $4 Trillion Question: What Explains FX Growth Since the 2007 Survey? , 2010 .

[66]  Manabu Ichino,et al.  Generalized Minkowski metrics for mixed feature-type data analysis , 1994, IEEE Trans. Syst. Man Cybern..

[67]  Miin-Shen Yang,et al.  Fuzzy clustering algorithms for mixed feature variables , 2004, Fuzzy Sets Syst..

[68]  Yoshikazu Terada,et al.  Multidimensional Scaling with Hyperbox Model for Percentile Dissimilarities , 2011 .

[69]  Edwin Diday,et al.  Adaptation of interval PCA to symbolic histogram variables , 2012, Adv. Data Anal. Classif..

[70]  Javier Arroyo,et al.  Different Approaches to Forecast Interval Time Series: A Comparison in Finance , 2011 .

[71]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[72]  Byron L. D. Bezerra,et al.  A symbolic approach for content-based information filtering , 2004, Inf. Process. Lett..

[73]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[74]  Edwin Diday,et al.  Generalization of the Principal Components Analysis to Histogram Data , 2000 .

[75]  Nilss Olekalns,et al.  Exchange Rate Instability: A Threshold Autoregressive Approach , 2001 .

[76]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[77]  André Hardy,et al.  Clustering and Validation of Interval Data , 2007 .

[78]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[79]  Mark J. Wierman,et al.  RANKING ORDINAL SCALES USING THE CONSENSUS MEASURE , 2005 .

[80]  P. Brito,et al.  Modelling interval data with Normal and Skew-Normal distributions , 2012 .

[81]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[82]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[83]  Marie Chavent,et al.  A monothetic clustering method , 1998, Pattern Recognit. Lett..

[84]  Thierry Denoeux,et al.  Multidimensional scaling of fuzzy dissimilarity data , 2002, Fuzzy Sets Syst..

[85]  Yves Lechevallier,et al.  New clustering methods for interval data , 2006, Comput. Stat..

[86]  Guoqiang Peter Zhang,et al.  Time series forecasting using a hybrid ARIMA and neural network model , 2003, Neurocomputing.

[87]  Francisco de A. T. de Carvalho,et al.  A New Method to Fit a Linear Regression Model for Interval-Valued Data , 2004, KI.

[88]  Konstantinos G. Margaritis,et al.  A Recommender System using Principal Component Analysis , 2007 .

[89]  R. Onimaru,et al.  A mathematical study to select fractionation regimen based on physical dose distribution and the linear-quadratic model. , 2012, International journal of radiation oncology, biology, physics.

[90]  R. Clemen Combining forecasts: A review and annotated bibliography , 1989 .

[91]  Edwin Diday,et al.  Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics) , 2007 .

[92]  Chenyi Hu,et al.  Impacts of Interval Computing on Stock Market Variability Forecasting , 2008 .

[93]  Francisco de A. T. de Carvalho,et al.  Univariate and Multivariate Linear Regression Methods to Predict Interval-Valued Features , 2004, Australian Conference on Artificial Intelligence.

[94]  Paula Brito Symbolic Clustering Of Probabilistic Data , 1998 .

[95]  Kin Keung Lai,et al.  Interval Time Series Analysis with an Application to the Sterling-Dollar Exchange Rate , 2008, J. Syst. Sci. Complex..

[96]  Yousef Saad,et al.  Farthest Centroids Divisive Clustering , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[97]  Yves Lechevallier,et al.  DIVCLUS-T: A monothetic divisive hierarchical clustering method , 2007, Comput. Stat. Data Anal..