Seriation in the Presence of Errors: NP-Hardness of l∞ -Fitting Robinson Structures to Dissimilarity Matrices

In this paper, we establish that the following fitting problem is NP-hard: given a finite set X and a dissimilarity measure d on X (d is a symmetric function from X2 to the nonnegative real numbers and vanishing on the diagonal), we wish to find a Robinsonian dissimilarity dR on X minimizing the l∞-error ||d − dR||∞ = maxx,y∈X{|d(x, y) − dR(x, y)|} between d and dR. Recall that a dissimilarity dR on X is called monotone (or Robinsonian) if there exists a total order ≺ on X such that x ≺ z ≺ y implies that d(x, y) ≥ max{d(x, z), d(z, y)}. The Robinsonian dissimilarities appear in seriation and clustering problems, in sparse matrix ordering and DNA sequencing.

[1]  J. Lingoes Some boundary conditions for a monotone analysis of symmetric matrices , 1971 .

[2]  Bruce Hendrickson,et al.  A Spectral Algorithm for Seriation and the Consecutive Ones Problem , 1999, SIAM J. Comput..

[3]  Thomas J. Schaefer,et al.  The complexity of satisfiability problems , 1978, STOC.

[4]  Mikkel Thorup,et al.  On the approximability of numerical taxonomy (fitting distances by tree metrics) , 1996, SODA '96.

[5]  S. Benzer The fine structure of the gene. , 1962, Scientific American.

[6]  François Brucker Modèles de classification en classes empiétantes , 2001 .

[7]  Frank Critchley,et al.  The partial order by inclusion of the principal classes of dissimilarity on a finite set, and some of their basic properties , 1994 .

[8]  Victor Chepoi,et al.  Seriation in the Presence of Errors: A Factor 16 Approximation Algorithm for l∞-Fitting Robinson Structures to Distances , 2011, Algorithmica.

[9]  Noëlle Bleuzen-Guernalec,et al.  Optimal Narrowing of a Block of Sortings in Optimal Time , 2004, Constraints.

[10]  David Halperin Musical chronology by seriation , 1994, Comput. Humanit..

[11]  János Podani,et al.  REARRANGEMENT OF ECOLOGICAL DATA MATRICES VIA MARKOV CHAIN MONTE CARLO SIMULATION , 2005 .

[12]  D. Kendall Incidence matrices, interval graphs and seriation in archeology. , 1969 .

[13]  Jean-Pierre Barthélemy,et al.  NP-hard Approximation Problems in Overlapping Clustering , 2001, J. Classif..

[14]  J. Leeuw,et al.  Multidimensional Data Analysis , 1989 .

[15]  V. Chepoi,et al.  SERIATION IN THE PRESENCE OF ERRORS : AN APPROXIMATION ALGORITHM FOR FITTING ROBINSON STRUCTURES TO DISSIMILARITY MATRICES , 2006 .

[16]  Robert E. Tarjan,et al.  A Linear-Time Algorithm for Testing the Truth of Certain Quantified Boolean Formulas , 1979, Inf. Process. Lett..

[17]  Carole Durand-Lepoivre Ordres et graphes pseudo-hiérarchiques : théorie et optimisation algorithmique , 1989 .

[18]  Sergey N. Rodin,et al.  Graphs and Genes , 1984 .

[19]  Michael B. Schiffer,et al.  Advances in Archaeological Method and Theory , 1978 .

[20]  Lawrence Hubert,et al.  SOME APPLICATIONS OF GRAPH THEORY AND RELATED NON‐METRIC TECHNIQUES TO PROBLEMS OF APPROXIMATE SERIATION: THE CASE OF SYMMETRIC PROXIMITY MEASURES , 1974 .

[21]  Edwin Diday,et al.  Orders and overlapping clusters by pyramids , 1987 .

[22]  W. Marquardt,et al.  Advances in Archaeological Seriation , 1981 .

[23]  V. Chepoi,et al.  l ∞ -approximation via subdominants , 2000 .

[24]  W. S. Robinson A Method for Chronologically Ordering Archaeological Deposits , 1951, American Antiquity.

[25]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[26]  Victor Chepoi,et al.  Recognition of Robinsonian dissimilarities , 1997 .

[27]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[28]  Bernard Van Cutsem,et al.  Classification And Dissimilarity Analysis , 1994 .

[29]  Peter C. C. Wang On incidence matrices , 1970 .

[30]  Prabhakar Raghavan,et al.  Sparse matrix reordering schemes for browsing hypertext , 1996 .

[31]  Mihai Badoiu,et al.  Approximation algorithm for embedding metrics into a two-dimensional space , 2003, SODA '03.

[32]  Gilles Caraux,et al.  PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order , 2005, Bioinform..

[33]  Johan Håstad,et al.  Fitting points on the real line and its application to RH mapping , 2003, J. Algorithms.

[34]  Hans-Hermann Bock,et al.  Classification and Related Methods of Data Analysis , 1988 .