SAXually Explicit Images: Finding Unusual Shapes

Over the past three decades, there has been a great deal of research on shape analysis, focusing mostly on shape indexing, clustering, and classification. In this work, we introduce the new problem of finding shape discords, the most unusual shapes in a collection. We motivate the problem by considering the utility of shape discords in diverse domains including zoology, anthropology, and medicine. While the brute force search algorithm has quadratic time complexity, we avoid this by using locality-sensitive hashing to estimate similarity between shapes which enables us to reorder the search more efficiently. An extensive experimental evaluation demonstrates that our approach can speed up computation by three to four orders of magnitude.

[1]  Jian Tang,et al.  On Complementarity of Cluster and Outlier Detection Schemes , 2003, DaWaK.

[2]  Michael H. F. Wilkinson,et al.  Automatic diatom identification using contour analysis by morphological curvature scale spaces , 2005, Machine Vision and Applications.

[3]  H. Deutsch Principle Component Analysis , 2004 .

[4]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[5]  Cyrus Shahabi,et al.  TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[6]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[7]  Brian M. Slator,et al.  Digital archive nerwork for anthropology (DANA): three-dimensional modeling and database development for Internet access , 2002 .

[8]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[9]  Oskar Söderkvist,et al.  Computer Vision Classification of Leaves from Swedish Trees , 2001 .

[10]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[11]  G. Gibson,et al.  Quantitative trait loci affecting components of wing shape in Drosophila melanogaster. , 2000, Genetics.

[12]  Peter J. van Otterloo,et al.  A contour-oriented approach to shape analysis , 1991 .

[13]  Henrik André-Jönsson,et al.  Using Signature Files for Querying Time-Series Data , 1997, PKDD.

[14]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[15]  E. R. Davies,et al.  Machine vision - theory, algorithms, practicalities , 2004 .

[16]  Sven Loncaric,et al.  A survey of shape analysis techniques , 1998, Pattern Recognit..

[17]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[18]  Guojun Lu,et al.  Review of shape representation and description techniques , 2004, Pattern Recognit..

[19]  M. J. O’Brien,et al.  Cladistics Is Useful for Reconstructing Archaeological Phylogenies: Palaeoindian Points from the Southeastern United States , 2001 .

[20]  Giorgio Terracina,et al.  Discovering Representative Models in Large Time Series Databases , 2004, FQAS.

[21]  V. Rich Personal communication , 1989, Nature.

[22]  Eamonn J. Keogh,et al.  LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures , 2006, VLDB.

[23]  D. Zhang,et al.  Principle Component Analysis , 2004 .

[24]  Edward Y. Chang,et al.  Enhanced perceptual distance functions and indexing for image replica recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Richard M. Karp,et al.  Gapped Local Similarity Search with Provable Guarantees , 2004, WABI.

[26]  J. R. Castrejón-Pita,et al.  Fractal Dimension in Butterflies’ Wings: a novel approach to understanding wing patterns ? , 2005, Journal of mathematical biology.

[27]  Shlomo Zilberstein,et al.  Anytime algorithm development tools , 1996, SGAR.

[28]  C. Finney,et al.  A review of symbolic analysis of experimental data , 2003 .

[29]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[30]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.

[31]  Noel E. O'Connor,et al.  A multiscale representation method for nonrigid shapes with a single closed contour , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[33]  Santosh S. Vempala,et al.  Locality-preserving hashing in multidimensional spaces , 1997, STOC '97.

[34]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[35]  Francisco Casacuberta,et al.  Cyclic Sequence Alignments: Approximate Versus Optimal Techniques , 2002, Int. J. Pattern Recognit. Artif. Intell..

[36]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[37]  Remco C. Veltkamp,et al.  State of the Art in Shape Matching , 2001, Principles of Visual Information Retrieval.

[38]  Jessica Lin,et al.  Visually mining and monitoring massive time series , 2004, KDD.

[39]  Kunihiko Sadakane,et al.  Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array , 2000, ISAAC.

[40]  Jeremy Buhler,et al.  Finding Motifs Using Random Projections , 2002, J. Comput. Biol..

[41]  Philip S. Yu,et al.  Rotation invariant indexing of shapes and line drawings , 2005, CIKM '05.