HyperSAX: Fast Approximate Search of Multidimensional Data

The increasing amount and size of data makes indexing and searching more difficult. It is especially challenging for multidimensional data such as images, videos, etc. In this paper we introduce a new indexable symbolic data representation that allows us to efficiently index and retrieve from a large amount of data that may appear in multiple dimensions. We use an approximate lower bounding distance measure to compute the distance between multidimensional arrays, which allows us to perform fast similarity searches. We present two search methods, exact and approximate, which can quickly retrieve data using our representation. Our approach is very general and works for many types of multidimensional data, including different types of image representations. Even for millions of multidimensional arrays, the approximate search will find a result in a few milliseconds, and will in many cases return a result similar to the best match.

[1]  Eamonn J. Keogh,et al.  iSAX 2.0: Indexing and Mining One Billion Time Series , 2010, 2010 IEEE International Conference on Data Mining.

[2]  Eamonn J. Keogh,et al.  Beyond one billion time series: indexing and mining very large time series collections with $$i$$SAX2+ , 2013, Knowledge and Information Systems.

[3]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[4]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[5]  James M. Kasson,et al.  An analysis of selected computer interchange color spaces , 1992, TOGS.

[6]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[7]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[8]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[9]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[10]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[11]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[12]  Shyi-Chyi Cheng,et al.  Speeding up the similarity search in high-dimensional image database by multiscale filtering and dynamic programming , 2006, Image Vis. Comput..

[13]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[14]  Henrik André-Jönsson,et al.  Indexing strategies for time series data , 2002 .

[15]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Amarnath Gupta,et al.  Virage image search engine: an open framework for image management , 1996, Electronic Imaging.

[17]  Anil K. Jain,et al.  Image retrieval using color and shape , 1996, Pattern Recognit..