Learning Linear Dependency Trees from Multivariate Time-series Data

Representing interactions between variables in large data sets in an understandable way is usually important and hard task. This article presents a methodology how a linear dependency structure between variables can be constructed from multivariate data. The dependencies between the variables are specified by multiple linear regression models. A sparse regression algorithm and bootstrap based resampling are used in the estimation of models and in construction of a belief graph. The belief graph highlights the most important mutual dependencies between the variables. Thresholding and graph operations may be applied to the belief graph to obtain a final dependency structure, which is a tree or a forest. In the experimental section results of the proposed method using real-world data set were realistic and convincing.

[1]  Juha Vesanto,et al.  An Automated Report Generation Tool for the Data Understanding Phase , 2001, HIS.

[2]  L. Breiman The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error , 1992 .

[3]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[4]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[5]  Ina Koch,et al.  Enumerating all connected maximal common subgraphs in two graphs , 2001, Theor. Comput. Sci..

[6]  C. Burrus,et al.  Noise reduction using an undecimated discrete wavelet transform , 1996, IEEE Signal Processing Letters.

[7]  Y. Selen,et al.  Model-order selection: a review of information criterion rules , 2004, IEEE Signal Processing Magazine.

[8]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[9]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[10]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[11]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[12]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[13]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[14]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[17]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[18]  Geoffrey M. Maruyama,et al.  Basics of structural equation modeling , 1997 .

[19]  C. Mallows More comments on C p , 1995 .

[20]  Harri Valpola,et al.  Independent Variable Group Analysis , 2001, ICANN.

[21]  Thomas Linke,et al.  Visualizing plant metabolomic correlation networks using clique-metabolite matrices , 2001, Bioinform..

[22]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.