Subseries Join and Compression of Time Series Data Based on Non-uniform Segmentation

A time series is composed of a sequence of data items that are measured at uniform intervals. Many application areas generate or manipulate time series, including finance, medicine, digital audio, and motion capture. Efficiently searching a large time series database is still a challenging problem, especially when partial or subseries matches are needed. This thesis proposes a new definition of subseries join, a symmetric generalization of subseries matching, which finds similar subseries in two or more time series datasets. A solution is proposed to compute the subseries join based on a hierarchical feature representation. This hierarchical feature representation is generated by an anisotropic diffusion scale-space analysis and a non-uniform segmentation method. Each segment is represented by a minimal polynomial envelope in a reduced-dimensionality space. Based on the hierarchical feature representation, all features in a dataset are indexed in an R-tree, and candidate matching features of two datasets are found by an R-tree join operation. Given candidate matching features, a dynamic programming algorithm is developed to compute the final subseries join. To improve storage efficiency, a hierarchical compression scheme is proposed to compress features. The minimal polynomial envelope representation is transformed to a Bézier spline envelope representation. The control points of each Bézier spline are then hierarchically differenced and an arithmetic coding is used to compress these differences. To empirically evaluate their effectiveness, the proposed subseries join and compression techniques are tested on various publicly available datasets. A large motion capture database is also used to verify the techniques in a real-world application. The experiments show that the proposed subseries join technique can better tolerate noise and local scaling than previous work, and the proposed compression technique can also achieve about 85% higher compression rates than previous work with the same distortion error.

[1]  Jehee Lee,et al.  Multiresolution Motion Analysis with Applications , 2000 .

[2]  Guodong Liu,et al.  Segment-based human motion compression , 2006, SCA '06.

[3]  Ioan Todinca,et al.  Listing all potential maximal cliques of a graph , 2000, Theor. Comput. Sci..

[4]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[5]  Eugene Fiume,et al.  An efficient search algorithm for motion data using weighted PCA , 2005, SCA '05.

[6]  Changzhou Wang,et al.  Supporting fast search in time series for movement patterns in multiple scales , 1998, CIKM '98.

[7]  Ming Li,et al.  Information Distance and its Applications , 2006, Int. J. Found. Comput. Sci..

[8]  Larry S. Davis,et al.  Towards 3-D model-based tracking and recognition of human movement: a multi-view approach , 1995 .

[9]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, SIGGRAPH 2004.

[10]  D Marr,et al.  A computational theory of human stereo vision. , 1979, Proceedings of the Royal Society of London. Series B, Biological sciences.

[11]  David A. Forsyth,et al.  Motion synthesis from annotations , 2003, ACM Trans. Graph..

[12]  Doug L. James,et al.  Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation , 2008, SCA 2008.

[13]  Marek Karpinski,et al.  A Fast Parallel Algorithm for Computing all Maximal Cliques in a Graph and the Related Problems (Extended Abstract) , 1988, SWAT.

[14]  E. Caiani,et al.  Warped-average template technique to track on a cycle-by-cycle basis the cardiac filling phases on left ventricular volume , 1998, Computers in Cardiology 1998. Vol. 25 (Cat. No.98CH36292).

[15]  Eugene L. Lawler,et al.  Sublinear approximate string matching and biological applications , 1994, Algorithmica.

[16]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[17]  Xian Zhang,et al.  Information distance from a question to an answer , 2007, KDD '07.

[18]  Z. Meral Özsoyoglu,et al.  Indexing large metric spaces for similarity search queries , 1999, TODS.

[19]  Li Wei,et al.  SAXually Explicit Images: Finding Unusual Shapes , 2006, Sixth International Conference on Data Mining (ICDM'06).

[20]  Stephen W. Smoliar,et al.  Multi-Media Search: An Authoring Perspective , 1998, Image Databases and Multi-Media Search.

[21]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[22]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[23]  Victor B. Zordan,et al.  Dynamic response for motion capture animation , 2005, SIGGRAPH 2005.

[24]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[25]  Daniel Thalmann,et al.  A Coherent Locomotion Engine Extrapolating Beyond Experimental Data , 2004 .

[26]  Amara Lynn Graps,et al.  An introduction to wavelets , 1995 .

[27]  Kazuo Iwama,et al.  Linear-Time Enumeration of Isolated Cliques , 2005, ESA.

[28]  Paul R. Cohen,et al.  Learned models for continuous planning , 1999, AISTATS.

[29]  Gonzalo Navarro Searching in metric spaces by spatial approximation , 2002, The VLDB Journal.

[30]  Jaewoo Kang,et al.  Efficient Subsequence Matching Using the Longest Common Subsequence with a Dual Match Index , 2007, MLDM.

[31]  Zicheng Liu,et al.  Hierarchical spacetime control , 1994, SIGGRAPH.

[32]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[33]  Toshiyuki Amagasa,et al.  The L - index: An indexing structure for ecient subsequence matching in time sequence databases , 2001 .

[34]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[35]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[36]  Peter H. Sellers,et al.  The Theory and Computation of Evolutionary Distances: Pattern Recognition , 1980, J. Algorithms.

[37]  Zvi Galil,et al.  An Improved Algorithm for Approximate String Matching , 1989, SIAM J. Comput..

[38]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[39]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[40]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[41]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[43]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[44]  Divyakant Agrawal,et al.  A comparison of DFT and DWT based similarity search in time-series databases , 2000, CIKM '00.

[45]  Jed Lengyel,et al.  Compression of time-dependent geometry , 1999, SI3D.

[46]  Lucas Kovar,et al.  Motion Graphs , 2002, ACM Trans. Graph..

[47]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[48]  Andrei Khodakovsky,et al.  Wavelet compression of parametrically coherent mesh sequences , 2004, SCA '04.

[49]  G. Farin Curves and Surfaces for Cagd: A Practical Guide , 2001 .

[50]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[51]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[52]  Dimitrios Gunopulos,et al.  Indexing multi-dimensional time-series with support for multiple distance measures , 2003, KDD '03.

[53]  Man Hon Wong,et al.  Efficient subsequence matching for sequences databases under time warping , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[54]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[55]  Yi Lin,et al.  Efficient Motion Search in Large Motion Capture Databases , 2006, ISVC.

[56]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[57]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[58]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[59]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[60]  Jesper Makholm Byskov Algorithms for k-colouring and finding maximal independent sets , 2003, SODA '03.

[61]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[62]  Okan Arikan,et al.  Interactive motion generation from examples , 2002, ACM Trans. Graph..

[63]  Bobby Bodenheimer,et al.  An evaluation of a cost metric for selecting transitions between motion segments , 2003, SCA '03.

[64]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[65]  Renée J. Miller,et al.  Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[66]  D Marr,et al.  Theory of edge detection , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[67]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[68]  Michael F. Cohen,et al.  Verbs and Adverbs: Multidimensional Motion Interpolation , 1998, IEEE Computer Graphics and Applications.

[69]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[70]  Jernej Barbic,et al.  Segmenting Motion Capture Data into Distinct Behaviors , 2004, Graphics Interface.

[71]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[72]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[73]  Sung Yong Shin,et al.  A hierarchical approach to interactive motion editing for human-like figures , 1999, SIGGRAPH.

[74]  Shashi Shekhar,et al.  Spatial Databases: A Tour , 2003 .

[75]  Kunio Kondo,et al.  Keyframes Extraction Method for Motion Capture Data , 2004 .

[76]  Eamonn J. Keogh Efficiently Finding Arbitrarily Scaled Patterns in Massive Time Series Databases , 2003, PKDD.

[77]  Feng Liu,et al.  3D motion retrieval with motion index tree , 2003, Comput. Vis. Image Underst..

[78]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[79]  Yannis Manolopoulos,et al.  Closest pair queries in spatial databases , 2000, SIGMOD '00.

[80]  Theodosios Pavlidis,et al.  Waveform Segmentation Through Functional Approximation , 1973, IEEE Transactions on Computers.

[81]  L. Tippett,et al.  Applied Statistics. A Journal of the Royal Statistical Society , 1952 .

[82]  Wesley W. Chu,et al.  Similarity-Based Subsequence Search in Image Sequence Databases , 2003, Int. J. Image Graph..

[83]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[84]  Christos Faloutsos,et al.  Similarity search without tears: the OMNI-family of all-purpose access methods , 2001, Proceedings 17th International Conference on Data Engineering.

[85]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[86]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[87]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.

[88]  Dimitrios Gunopulos,et al.  Fast Motion Capture Matching with Replicated Motion Editing , 2003 .

[89]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[90]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[91]  Yi Lin,et al.  Nonuniform Segment-Based Compression of Motion Capture Data , 2007, ISVC.

[92]  Ashraf A. Kassim,et al.  Compression of Dynamic 3D Geometry Data Using Iterative Closest Point Algorithm , 2002, Comput. Vis. Image Underst..

[93]  Daniel Cohen-Or,et al.  Action synopsis: pose selection and illustration , 2005, ACM Trans. Graph..

[94]  Journal of Molecular Biology , 1959, Nature.

[95]  Eamonn J. Keogh,et al.  Detecting time series motifs under uniform scaling , 2007, KDD '07.

[96]  Aaron Hertzmann,et al.  Style-based inverse kinematics , 2004, ACM Trans. Graph..

[97]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[98]  David A. Hull Improving text retrieval for the routing problem using latent semantic indexing , 1994, SIGIR '94.

[99]  Eamonn J. Keogh,et al.  A Probabilistic Approach to Fast Pattern Matching in Time Series Databases , 1997, KDD.

[100]  Okan Arikan Compression of motion capture databases , 2006, ACM Trans. Graph..

[101]  Jessica K. Hodgins,et al.  Performance animation from low-dimensional control signals , 2005, ACM Trans. Graph..

[102]  Michael Gleicher,et al.  Automated extraction and parameterization of motions in large data sets , 2004, SIGGRAPH 2004.

[103]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[104]  Harry Shum,et al.  Motion texture: a two-level statistical model for character motion synthesis , 2002, ACM Trans. Graph..

[105]  Meinard Müller,et al.  Efficient content-based retrieval of motion capture data , 2005, SIGGRAPH '05.

[106]  E. A. Akkoyunlu,et al.  The Enumeration of Maximal Cliques of Large Graphs , 1973, SIAM J. Comput..

[107]  Eamonn J. Keogh,et al.  The Asymmetric Approximate Anytime Join: A New Primitive with Applications to Data Mining , 2008, SDM.

[108]  Bin Wu,et al.  A Parallel Algorithm for Enumerating All Maximal Cliques in Complex Network , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[109]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[110]  Yang-Sae Moon,et al.  General match: a subsequence matching method in time-series databases based on generalized windows , 2002, SIGMOD '02.

[111]  Christoph Bregler,et al.  Motion capture assisted animation: texturing and synthesis , 2002, ACM Trans. Graph..

[112]  Yang-Sae Moon,et al.  Duality-based subsequence matching in time-series databases , 2001, Proceedings 17th International Conference on Data Engineering.

[113]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[114]  Wesley W. Chu,et al.  An index-based approach for similarity search supporting time warping in large sequence databases , 2001, Proceedings 17th International Conference on Data Engineering.

[115]  James Copland,et al.  PROCEEDINGS OF THE ROYAL SOCIETY. , 2022 .

[116]  Sung Yong Shin,et al.  Rhythmic-motion synthesis based on motion-beat analysis , 2003, ACM Trans. Graph..

[117]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, ACM Trans. Graph..

[118]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[119]  C. Myers,et al.  A level building dynamic time warping algorithm for connected word recognition , 1981 .

[120]  Henrik André-Jönsson,et al.  Using Signature Files for Querying Time-Series Data , 1997, PKDD.

[121]  Péter Gács,et al.  Information Distance , 1998, IEEE Trans. Inf. Theory.

[122]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[123]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[124]  TWO-WEEK Loan COpy,et al.  University of California , 1886, The American journal of dental science.

[125]  Jarek Rossignac,et al.  Dynapack: space-time compression of the 3D animations of triangle meshes with fixed connectivity , 2003, SCA '03.

[126]  Klaus Gollmer,et al.  Detection of distorted pattern using dynamic time warping algorithm and application for supervision , 1995 .