Periodic pattern analysis of non-uniformly sampled stock market data

Periodic pattern detection is an important data mining task that highlights the temporal regularities within the data. It aims at finding if a partial or full pattern has a cyclic repetition in the considered time series or data sequence. Periodicity is found in large number of datasets including meteorological data, transaction count, computer network traffic, power consumption, sunspots, Electrocardiography ECG, biological sequences such as DNA and protein [33]. Periodic pattern analysis not only helps in understanding the behavior of the data but also contributes in predicting the future trends of the data. There are several algorithms reported in the literature for periodicity detection in time series and biological sequences [3,34] but none of these algorithms discuss the non-uniformly sampled data. General assumption in the time series and sequence data is that the consecutive data values are sampled at regular or uniform interval of time. But this assumption hardly holds in real datasets; for example the stock market data analyzed in this paper record various features for each working day. This data has a quite a few missing values for weekly and arbitrary holidays. Although handling this issue is not very complex but requires careful handling. In this paper we analyze the stock market data in detail and show how the periodic pattern analysis may provide the understanding of the data to predict the future trends. Our experimental results show that consideration of missing values in stock market data results in much larger number of interesting results than the trivial periodicity detection approach ignoring the missing values.

[1]  Jignesh M. Patel,et al.  Practical methods for constructing suffix trees , 2005, The VLDB Journal.

[2]  Christos Faloutsos,et al.  AWSOM: Adaptive, Hands-Off Stream Mining , 2003 .

[3]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[4]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[5]  Mohammed Al-Shalalfa,et al.  Adapting Machine Learning Technique for Periodicity Detection in Nucleosomal Locations in Sequences , 2007, IDEAL.

[6]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[7]  Wolfgang Gerlach,et al.  Compressed suffix tree - a basis for genome-scale sequence analysis , 2007, Bioinform..

[8]  Jiawei Han,et al.  Mining Segment-Wise Periodic Patterns in Time-Related Databases , 1998, KDD.

[9]  M. V. Katti,et al.  Amino acid repeat patterns in protein sequences: Their diversity and structural‐functional implications , 2000, Protein science : a publication of the Protein Society.

[10]  Frida Eng,et al.  Non-Uniform Sampling in Statistical Signal Processing , 2007 .

[11]  Joseph L. Hellerstein,et al.  Mining partially periodic event patterns with unknown periods , 2001, Proceedings 17th International Conference on Data Engineering.

[12]  Yuriy Reznik,et al.  On tries, suffix trees, and universal variable-length-to-block codes , 2002, Proceedings IEEE International Symposium on Information Theory,.

[13]  Walid G. Aref,et al.  Multiple and Partial Periodicity Mining in Time Series Databases , 2002, ECAI.

[14]  Jie Chen,et al.  Bioinformatics Original Paper Detecting Periodic Patterns in Unevenly Spaced Gene Expression Time Series Using Lomb–scargle Periodograms , 2022 .

[15]  Azzedine Lansari,et al.  A new non-recursive algorithm for binary search tree traversal , 2003, 10th IEEE International Conference on Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003.

[16]  R. Coppel,et al.  Sequence variation in S-antigen genes of Plasmodium falciparum. , 1987, Molecular biology & medicine.

[17]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[18]  Walid G. Aref,et al.  Periodicity detection in time series databases , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Zvi Galil,et al.  Faster tree pattern matching , 1994, JACM.

[20]  Ilya Shmulevich,et al.  Robust regression for periodicity detection in non-uniformly sampled time-course gene expression data , 2007, BMC Bioinformatics.

[21]  Ronald K. Pearson,et al.  BMC Bioinformatics BioMed Central Methodology article , 2005 .

[22]  Philip S. Yu,et al.  InfoMiner+: mining partial periodic patterns with gap penalties , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[23]  Cyrus Shahabi,et al.  TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[24]  Gesine Reinert,et al.  Probabilistic and Statistical Properties of Words: An Overview , 2000, J. Comput. Biol..

[25]  Walid G. Aref,et al.  WARP: time warping for periodicity detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[26]  Philip S. Yu,et al.  STAMP: On Discovery of Statistically Important Pattern Repeats in Long Sequential Data , 2003, SDM.

[27]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[28]  Chia-Hui Chang,et al.  SMCA: a general model for mining asynchronous periodic patterns in temporal databases , 2005, IEEE Transactions on Knowledge and Data Engineering.

[29]  Mark Daniel Ward,et al.  Analysis of the average depth in a suffix tree under a Markov model , 2005 .

[30]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[31]  Jaakko Astola,et al.  Clustering the non-uniformly sampled time series of gene expression data , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[32]  Mohammed Al-Shalalfa,et al.  Adaptive Machine Learning Technique for Periodicity Detection in Biological Sequences , 2009, Int. J. Neural Syst..

[33]  Nitin Kumar,et al.  Time-series Bitmaps: a Practical Visualization Tool for Working with Large Time Series Databases , 2005, SDM.

[34]  Philip S. Yu,et al.  Infominer: mining surprising periodic patterns , 2001, KDD '01.

[35]  Mong-Li Lee,et al.  Mining Dense Periodic Patterns in Time Series Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[36]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[37]  Mohammed Al-Shalalfa,et al.  Efficient Periodicity Mining in Time Series Databases Using Suffix Trees , 2011, IEEE Transactions on Knowledge and Data Engineering.

[38]  Roberto Grossi,et al.  Suffix trees and their applications in string algorithms , 1993 .

[39]  Piotr Indyk,et al.  Identifying Representative Trends in Massive Time Series Data Sets Using Sketches , 2000, VLDB.

[40]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[41]  Hongjun Lu,et al.  Constructing suffix tree for gigabyte sequences with megabyte memory , 2005, IEEE Transactions on Knowledge and Data Engineering.

[42]  Christos Faloutsos,et al.  Adaptive, Hands-Off Stream Mining , 2003, VLDB.

[43]  N. Packard,et al.  Nonlinear analysis of data sampled nonuniformly in time , 1992 .

[44]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[45]  Petre Stoica,et al.  Spectral analysis of nonuniformly sampled data - a review , 2010, Digit. Signal Process..