Data Science and Classification

Similarity and Dissimilarity.- A Tree-Based Similarity for Evaluating Concept Proximities in an Ontology.- Improved Frechet Distance for Time Series.- Comparison of Distance Indices Between Partitions.- Design of Dissimilarity Measures: A New Dissimilarity Between Species Distribution Areas.- Dissimilarities for Web Usage Mining.- Properties and Performance of Shape Similarity Measures.- Classification and Clustering.- Hierarchical Clustering for Boxplot Variables.- Evaluation of Allocation Rules Under Some Cost Constraints.- Crisp Partitions Induced by a Fuzzy Set.- Empirical Comparison of a Monothetic Divisive Clustering Method with the Ward and the k-means Clustering Methods.- Model Selection for the Binary Latent Class Model: A Monte Carlo Simulation.- Finding Meaningful and Stable Clusters Using Local Cluster Analysis.- Comparing Optimal Individual and Collective Assessment Procedures.- Network and Graph Analysis.- Some Open Problem Sets for Generalized Blockmodeling.- Spectral Clustering and Multidimensional Scaling: A Unified View.- Analyzing the Structure of U.S. Patents Network.- Identifying and Classifying Social Groups: A Machine Learning Approach.- Analysis of Symbolic Data.- Multidimensional Scaling of Histogram Dissimilarities.- Dependence and Interdependence Analysis for Interval-Valued Variables.- A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data.- Symbolic Clustering of Large Datasets.- A Dynamic Clustering Method for Mixed Feature-Type Symbolic Data.- General Data Analysis Methods.- Iterated Boosting for Outlier Detection.- Sub-species of Homopus Areolatus? Biplots and Small Class Inference with Analysis of Distance.- Revised Boxplot Based Discretization as the Kernel of Automatic Interpretation of Classes Using Numerical Variables.- Data and Web Mining.- Comparison of Two Methods for Detecting and Correcting Systematic Error in High-throughput Screening Data.- kNN Versus SVM in the Collaborative Filtering Framework.- Mining Association Rules in Folksonomies.- Empirical Analysis of Attribute-Aware Recommendation Algorithms with Variable Synthetic Data.- Patterns of Associations in Finite Sets of Items.- Analysis of Music Data.- Generalized N-gram Measures for Melodic Similarity.- Evaluating Different Approaches to Measuring the Similarity of Melodies.- Using MCMC as a Stochastic Optimization Procedure for Musical Time Series.- Local Models in Register Classification by Timbre.- Gene and Microarray Analysis.- Improving the Performance of Principal Components for Classification of Gene Expression Data Through Feature Selection.- A New Efficient Method for Assessing Missing Nucleotides in DNA Sequences in the Framework of a Generic Evolutionary Model.- New Efficient Algorithm for Modeling Partial and Complete Gene Transfer Scenarios.

[1]  Niall M. Adams,et al.  Comparing classifiers when the misallocation costs are uncertain , 1999, Pattern Recognit..

[2]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[4]  G M Megson,et al.  Comparison of Techniques , 1999 .

[5]  S Michiels,et al.  Prediction of cancer outcome with microarrays , 2005, The Lancet.

[6]  J. Wiens,et al.  Missing data, incomplete taxa, and phylogenetic accuracy. , 2003, Systematic biology.

[7]  Pierre P. Lévy,et al.  The case view, a generic method of visualization of the case mix , 2004, Int. J. Medical Informatics.

[8]  Martin G. Everett,et al.  Network analysis of 2-mode data , 1997 .

[9]  Hamparsum Bozdogan,et al.  Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Informational Measure of Complexity , 1994 .

[10]  T. Louis,et al.  Bayes and Empirical Bayes Methods for Data Analysis. , 1997 .

[11]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[12]  Kamel Jedidi,et al.  STEMM: A General Finite Mixture Structural Equation Model , 1997 .

[13]  Jianqing Fan,et al.  Local maximum likelihood estimation and inference , 1998 .

[14]  Gérard Govaert,et al.  An EM algorithm for the block mixture model , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Dee L. Clayman Trends and Issues in Quantitative Stylistics , 1992 .

[16]  Giuseppe Bove,et al.  A method of asymmetric multidimensional scaling with external information , 2004 .

[17]  M. Chavent,et al.  Trois nouvelles méthodes de classification automatique de données symboliques de type intervalle , 2003 .

[18]  Roger N. Shepard,et al.  Additive clustering: Representation of similarities as combinations of discrete overlapping properties. , 1979 .

[19]  Akinori Okada,et al.  UNIVERSITY ENROLLMENT FLOW AMONG THE JAPANESE PREFECTURES: A Comparison before and after the Joint First Stage Achievement Test by Asymmetric Cluster Analysis , 1996 .

[20]  Yves Lechevallier,et al.  Usage Guided Clustering of Web Pages with the Median Self Organizing Map , 2005, ESANN.

[21]  Ralf Wagner,et al.  Mining Promising Qualification Patterns , 2005 .

[22]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[23]  Daniel Müllensiefen,et al.  Modeling Memory for Melodies , 2005, GfKl.

[24]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[25]  Simona Balbi,et al.  Procrustes Techniques for Text Mining , 2006 .

[26]  Wei-Chien Chang On using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions , 1983 .

[27]  Constantin Zopounidis,et al.  Evaluating country risk: A decision support approach , 1992 .

[28]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[29]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[30]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[31]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[32]  Kurt Hornik,et al.  The Design and Analysis of Benchmark Experiments , 2005 .

[33]  Jeffrey G. Glosup Statistical Methods in Computer Security , 2006, Technometrics.

[34]  B. Roy Méthodologie multicritère d'aide à la décision , 1985 .

[35]  Edwin Diday,et al.  A Recent Advance in Data Analysis: Clustering Objects into Classes Characterized by Conjunctive Concepts , 1981 .

[36]  N. Lin Buidling a Network Theory of Social Capital , 1999, Connections.

[37]  Kaneo Yamada,et al.  The Number of People in Japan with Coagulation Disorders: 2001 Update , 2003, International journal of hematology.

[38]  M. Stephens Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods , 2000 .

[39]  Hans-Hermann Bock,et al.  Classification, Clustering, and Data Analysis , 2002 .

[40]  Shmuel Nitzan,et al.  Comparing Optimal Individual and Collective Assessment Procedures , 2006, Data Science and Classification.

[41]  Akinori Okada,et al.  An Asymmetric Cluster Analysis Study of Car Switching Data , 2000 .

[42]  B. Trousse,et al.  Data preprocessing for WUM , 2004, IEEE Potentials.

[43]  K. Kosmelj,et al.  Cross-sectional approach for clustering time varying data , 1990 .

[44]  W. Heiser,et al.  Graphical representations and odds ratios in a distance-association model for the analysis of cross-classified data , 2005 .

[45]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[46]  Rick L. Andrews,et al.  Recovering and profiling the true segmentation structure in markets: an empirical investigation , 2003 .

[47]  Helmut Schaffrath,et al.  Struktur und Ähnlichkeit : Methoden automatisierter Melodienanalyse , 1985 .

[48]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[49]  Jianqing Fan,et al.  Local polynomial kernel regression for generalized linear models and quasi-likelihood functions , 1995 .

[50]  J. Tukey,et al.  Performance of Some Resistant Rules for Outlier Labeling , 1986 .

[51]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[52]  Carlo Lauro,et al.  PLS Typological Regression: Algorithmic, Classification and Validation Issues , 2005 .

[53]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[54]  Hans-Hermann Bock,et al.  Two-mode clustering methods: astructuredoverview , 2004, Statistical methods in medical research.

[55]  Ludovic Lebart Text mining in different languages , 1998 .

[56]  G. McLachlan,et al.  Assessing the Number of Components in Mixture Models , 2005 .

[57]  Naohito Chino V-1 Metric and nonmetric Hermitian canonical models for Asymmetric MDS , 1992 .

[58]  Yoav Benjamini,et al.  Opening the Box of a Boxplot , 1988 .

[59]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[60]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[61]  Frederick Mosteller,et al.  Applied Bayesian and classical inference : the case of the Federalist papers , 1984 .

[62]  J. Kent The Complex Bingham Distribution and Shape Analysis , 1994 .

[63]  Jean-Paul Rasson,et al.  Stratification Before Discriminant Analysis: A Must? , 2005 .

[64]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[65]  Torsten Hothorn,et al.  Bundling Classifiers by Bagging Trees , 2002, Comput. Stat. Data Anal..

[66]  Edwin Diday,et al.  Spatial Pyramidal Clustering Based on a Tessellation , 2004 .

[67]  J. Friedman Clustering objects on subsets of attributes , 2002 .

[68]  千野 直仁,et al.  A maximum likelihood method for asymmetric MDS (2) , 2005 .

[69]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[70]  Simona Balbi,et al.  Rotated Canonical Correlation Analysis for Multilingual Corpora , 2006 .

[71]  Reinhold Decker,et al.  The Number of Clusters in Market Segmentation , 2005, Data Analysis and Decision Support.

[72]  Lars Schmidt-Thieme,et al.  Attribute-aware Collaborative Filtering , 2005, GfKl.

[73]  Ales Ziberna,et al.  Generalized blockmodeling of valued networks , 2013, Soc. Networks.

[74]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[75]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[76]  Frank Bretz,et al.  Assessment of Optimal Selected Prognostic Factors , 2002 .

[77]  Francisco de A. T. de Carvalho,et al.  Clustering of interval data based on city-block distances , 2004, Pattern Recognit. Lett..

[78]  Berthold Lausen Bioinformatics and Classification: The Analysis of Genome Expression Data , 2002 .

[79]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[80]  L. Klein-Hitpass,et al.  Microarray versus conventional prediction of lymph node metastasis in colorectal carcinoma , 2005, Cancer.

[81]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[82]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[83]  R. Tibshirani,et al.  Local Likelihood Estimation , 1987 .

[84]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[85]  A. Tversky,et al.  Additive similarity trees , 1977 .

[86]  Iven Van Mechelen,et al.  Hierarchical classes models for three-way three-mode binary data: interrelations and model selection , 2005 .

[87]  Ulrich Möller,et al.  Performance of data resampling methods for robust class discovery based on clustering , 2006, Intell. Data Anal..

[88]  W. DeSarbo,et al.  A mixture likelihood approach for generalized linear models , 1995 .

[89]  Yoshihiko Hamamoto,et al.  Prediction of cancer outcome with microarrays , 2005, The Lancet.

[90]  Peter D. Rhodes Building a Network , 1995 .

[91]  Jeroen K. Vermunt,et al.  7. Multilevel Latent Class Models , 2003 .

[92]  Mia Hubert,et al.  LIBRA: a MATLAB library for robust analysis , 2005 .

[93]  Michel Tenenhaus,et al.  Analyse en composantes principales d'un ensemble de variables nominales ou numériques , 1977 .

[94]  Lynne Billard Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, edited by H.-H. Bock and E. Diday , 2001, J. Classif..

[95]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[96]  Vladimir Batagelj,et al.  Symbolic Data Analysis Approach to Clustering Large Datasets , 2002 .

[97]  J. Ohn,et al.  Does Adding Characters with Missing Data Increase or Decrease Phylogenetic Accuracy ? , 2003 .

[98]  C. Mallows A Note on Asymptotic Joint Normality , 1972 .

[99]  D. M. Allen,et al.  Determining the number of components in mixtures of linear models , 2001 .

[100]  Frank Huber,et al.  Capturing Customer Heterogeneity using a Finite Mixture PLS Approach , 2002 .

[101]  Rafik Abdesselam,et al.  A Geometrical Relational Model for Data Analysis , 2000 .

[102]  Jean-Michel Poggi,et al.  Boosting and instability for regression trees , 2006, Comput. Stat. Data Anal..

[103]  B. Mirkin A sequential fitting procedure for linear data analysis models , 1990 .

[104]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[105]  Brigitte Trousse,et al.  Advanced data preprocessing for intersites Web usage mining , 2004, IEEE Intelligent Systems.

[106]  Yves Lechevallier,et al.  Dynamical Clustering of Interval Data: Optimization of an Adequacy Criterion Based on Hausdorff Distance , 2002 .

[107]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[108]  Fouad Badran,et al.  Hierarchical clustering of self-organizing maps for cloud classification , 2000, Neurocomputing.

[109]  Yuan Sun,et al.  Citation Database for Japanese Papers: A new bibliometric tool for Japanese academic society , 2004, Scientometrics.

[110]  Fionn Murtagh,et al.  Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering , 1995, Pattern Recognit. Lett..

[111]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .

[112]  Silvia Lanteri,et al.  Classification of olive oils from their fatty acid composition , 1983 .

[113]  G. Saporta Simultaneous Analysis of Qualitative and Quantitative Data , 1990 .

[114]  Li Yang Building k-connected neighborhood graphs for isometric data embedding , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[115]  Uwe Blien,et al.  Typisierung von Bezirken der Agenturen für Arbeit , 2004 .

[116]  Jennifer Widom,et al.  Exploiting hierarchical domain structure to compute similarity , 2003, TOIS.

[117]  Claus Weihs,et al.  Parameter Optimization in Automatic Transcription of Music , 2005, GfKl.

[118]  Ulrich Eckhardt,et al.  Shape descriptors for non-rigid shapes with a single closed contour , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[119]  Bradley P. Carlin,et al.  BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[120]  Jia Li Clustering Based on a Multi-layer Mixture Model , 2005 .

[121]  Forrest W. Young Quantitative analysis of qualitative data , 1981 .

[122]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[123]  R. Breiger The Duality of Persons and Groups , 1974 .

[124]  Joseph Rudman,et al.  The State of Authorship Attribution Studies: Some Problems and Solutions , 1997, Comput. Humanit..

[125]  A. Buja,et al.  Remarks on Parallel Analysis. , 1992, Multivariate behavioral research.

[126]  Simon J. Godsill,et al.  Bayesian harmonic models for musical pitch estimation and analysis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[127]  J. Carroll,et al.  K-means clustering in a low-dimensional Euclidean space , 1994 .

[128]  Iven Van Mechelen,et al.  One-mode additive clustering of multiway data , 2005 .

[129]  Vladimir Batagelj,et al.  Compositional data analysis with R , 2003 .

[130]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[131]  Willem J. Heiser,et al.  Models for asymmetric proximities , 1996 .

[132]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[133]  David C. Hoaglin,et al.  Some Implementations of the Boxplot , 1989 .

[134]  Chaomei Chen Generalised similarity analysis and pathfinder network scaling , 1998, Interact. Comput..

[135]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[136]  Edwin Diday,et al.  Elagage et aide à l'interprétation symbolique et graphique d'une pyramide , 2005, EGC.

[137]  Mohamed Nadif,et al.  Fuzzy clustering to estimate the parameters of block mixture models , 2006, Soft Comput..

[138]  Jean-Michel Poggi,et al.  Outlier Detection by Boosting Regression Trees , 2006 .

[139]  Daniel Baier,et al.  Market Simulation Using Bayesian Procedures in Conjoint Analysis , 2003 .

[140]  Charles Lewis,et al.  A Nonparametric Approach to the Analysis of Dichotomous Item Responses , 1982 .

[141]  Bart De Moor,et al.  Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[142]  B.J.P. Salemans,et al.  Building stemmas with the computer in a Cladistic, Neo-Lachmannian, way: the case of fourteen text versions of Lanseloet van Denemerken , 2000 .

[143]  H. Kiers,et al.  Factorial k-means analysis for two-way data , 2001 .

[144]  J. Pagès Analyse factorielle de données mixtes , 2004 .

[145]  Richard S. Zemel,et al.  Unsupervised Learning with Non-Ignorable Missing Data , 2005, AISTATS.

[146]  Klaus Obermayer,et al.  A new summarization method for affymetrix probe level data , 2006, Bioinform..

[147]  Manabu Ichino,et al.  Generalized Minkowski metrics for mixed feature-type data analysis , 1994, IEEE Trans. Syst. Man Cybern..