Multilinear Algebra for Analyzing Data with Multiple Linkages

Link analysis typically focuses on a single type of connection, e.g., two journal papers are linked because they are written by the same author. However, often we want to analyze data that has multiple linkages between objects, e.g., two papers may have the same keywords and one may cite the other. The goal of this paper is to show that multilinear algebra provides a tool for multilink analysis. We analyze five years of publication data from journals published by the Society for Industrial and Applied Mathematics. We explore how papers can be grouped in the context of multiple link types using a tensor to represent all the links between them. A PARAFAC decomposition on the resulting tensor yields information similar to the SVD decomposition of a standard adjacency matrix. We show how the PARAFAC decomposition can be used to understand the structure of the document space and define paper-paper similarities based on multiple linkages. Examples are presented where the decomposed tensor data is used to find papers similar to a body of work (e.g., related by topic or similar to a particular author's papers), find related authors using linkages other than explicit co-authorship or citations, distinguish between papers written bymore » different authors with the same name, and predict the journal in which a paper was published.« less

[1]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[2]  Arindam Banerjee,et al.  Multi-way Clustering on Relation Graphs , 2007, SDM.

[3]  Tamara G. Kolda,et al.  Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..

[4]  Carl Lagoze,et al.  Detecting research topics via the correlation between graphs and texts , 2007, KDD '07.

[5]  B. Hendrickson Latent semantic analysis and Fiedler retrieval , 2007 .

[6]  Philip S. Yu,et al.  Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[8]  T. Kolda Multilinear operators for higher-order decompositions , 2006 .

[9]  Rasmus Bro,et al.  A comparison of algorithms for fitting the PARAFAC model , 2006, Comput. Stat. Data Anal..

[10]  Brett W. Bader,et al.  The TOPHITS Model for Higher-Order Web Link Analysis∗ , 2006 .

[11]  Efstratios Gallopoulos,et al.  TMG: A MATLAB Toolbox for Generating Term-Document Matrices from Text Collections , 2006, Grouping Multidimensional Data.

[12]  David D. Jensen,et al.  The case for anomalous link discovery , 2005, SKDD.

[13]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[14]  Amit P. Sheth,et al.  Discovering informative connection subgraphs in multi-relational graphs , 2005, SKDD.

[15]  Tamara G. Kolda,et al.  Higher-order Web link analysis using multilinear algebra , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[16]  Zheng Chen,et al.  Text representation: from vector to tensor , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[17]  Xuelong Li,et al.  Supervised tensor learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[18]  David E. Booth,et al.  Multi-Way Analysis: Applications in the Chemical Sciences , 2005, Technometrics.

[19]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[20]  Huan Liu,et al.  CubeSVD: a novel approach to personalized Web search , 2005, WWW '05.

[21]  Bülent Yener,et al.  Modeling and Multiway Analysis of Chatroom Tensors , 2005, ISI.

[22]  R. Bro,et al.  PARAFAC and missing values , 2005 .

[23]  Lada A. Adamic,et al.  How to search a social network , 2003, Soc. Networks.

[24]  Kevin W Boyack,et al.  Mapping knowledge domains: Characterizing PNAS , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[26]  Shou-De Lin,et al.  Using unsupervised link discovery methods to find interesting facts and connections in a bibliography dataset , 2003, SKDD.

[27]  Foster J. Provost,et al.  The myth of the double-blind review?: author identification using only citations , 2003, SKDD.

[28]  David D. Jensen,et al.  Exploiting relational structure to understand publication patterns in high-energy physics , 2003, SKDD.

[29]  Rasmus Bro,et al.  Recent developments in CANDECOMP/PARAFAC algorithms: a critical review , 2003 .

[30]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[31]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[32]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[33]  Tamara G. Kolda,et al.  Orthogonal Tensor Decompositions , 2000, SIAM J. Matrix Anal. Appl..

[34]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[35]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[36]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[37]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[38]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[39]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[40]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[41]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[42]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[43]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[44]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .