Network Topology Identification using PCA and its Graph Theoretic Interpretations

We solve the problem of identifying (reconstructing) network topology from steady state network measurements. Concretely, given only a data matrix $\mathbf{X}$ where the $X_{ij}$ entry corresponds to flow in edge $i$ in configuration (steady-state) $j$, we wish to find a network structure for which flow conservation is obeyed at all the nodes. This models many network problems involving conserved quantities like water, power, and metabolic networks. We show that identification is equivalent to learning a model $\mathbf{A_n}$ which captures the approximate linear relationships between the different variables comprising $\mathbf{X}$ (i.e. of the form $\mathbf{A_n X \approx 0}$) such that $\mathbf{A_n}$ is full rank (highest possible) and consistent with a network node-edge incidence structure. The problem is solved through a sequence of steps like estimating approximate linear relationships using Principal Component Analysis, obtaining f-cut-sets from these approximate relationships, and graph realization from f-cut-sets (or equivalently f-circuits). Each step and the overall process is polynomial time. The method is illustrated by identifying topology of a water distribution network. We also study the extent of identifiability from steady-state data.

[1]  S. Parker,et al.  A Direct Procedure for the Synthesis of Network Graphs from a Given Fundamental Loop or Cutset Matrix , 1969 .

[2]  N.R. Malik,et al.  Graph theory with applications to engineering and computer science , 1975, Proceedings of the IEEE.

[3]  Satoru Fujishige,et al.  An Efficient PQ-Graph Algorithm for Solving the Graph-Realization Problem , 1980, J. Comput. Syst. Sci..

[4]  Robert E. Bixby,et al.  An Almost Linear-Time Algorithm for Graph Realization , 1988, Math. Oper. Res..

[5]  Darren T. Andrews,et al.  Maximum likelihood principal component analysis , 1997 .

[6]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[7]  Sirish L. Shah,et al.  Model Identification and Error Covariance Matrix Estimation from Noisy Data Using PCA , 2004 .

[8]  R. Mahadevan,et al.  Using metabolic flux data to further constrain the metabolic solution space and predict internal flux patterns: the Escherichia coli spectrum , 2004, Biotechnology and bioengineering.

[9]  Robert Nowak,et al.  Network Tomography: Recent Developments , 2004 .

[10]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[11]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[12]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[13]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[14]  Peng-Yung Woo,et al.  Realization of the Linear Tree that Corresponds to a Fundamental Loop Matrix , 2010, Wirel. Sens. Netw..

[15]  Arun K. Tangirala,et al.  Quantitative analysis of directional strengths in jointly stationary linear multivariate processes , 2010, Biological Cybernetics.

[16]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[17]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[18]  Sean C. Warnick,et al.  Robust dynamical network structure reconstruction , 2011, Autom..

[19]  S. Gigi,et al.  Reconstructing Plant Connectivity using Directed Spectral Decomposition , 2012 .

[20]  Vijay Arya,et al.  Inferring connectivity model from meter measurements in distribution networks , 2013, e-Energy '13.

[21]  Nirav Bhatt,et al.  Deconstructing principal component analysis using a data reconciliation perspective , 2015, Comput. Chem. Eng..

[22]  Ramkrishna Pasumarthy,et al.  A novel approach for phase identification in smart grids using Graph Theory and Principal Component Analysis , 2015, 2016 American Control Conference (ACC).