Low-Rank Spectral Learning with Weighted Loss Functions

Kulesza et al. [2014] recently observed that low-rank spectral learning algorithms, which discard the smallest singular values of a moment matrix during training, can behave in unexpected ways, producing large errors even when the discarded singular values are arbitrarily small. In this paper we prove that when learning predictive state representations those problematic cases disappear if we introduce a particular weighted loss function and learn using suciently large sets of statistics; our main result is a bound on the loss of the learned low-rank model in terms of the singular values that are discarded. Practically speaking, this suggests that regardless of the model rank we should use the largest possible sets of statistics, and we show empirically that this is true on both synthetic and real-world domains.

[1]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[2]  R. Varga,et al.  Singular value decomposition Geršgorin sets , 2006 .

[3]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[4]  Le Song,et al.  A Spectral Algorithm for Latent Tree Graphical Models , 2011, ICML.

[5]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[6]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[7]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[8]  Ariadna Quattoni,et al.  Spectral learning of weighted automata , 2014, Machine Learning.

[9]  R. Kennedy,et al.  Hilbert Space Methods in Signal Processing , 2013 .

[10]  Alex Kulesza,et al.  Low-Rank Spectral Learning , 2014, AISTATS.

[11]  Karl Stratos,et al.  Spectral learning of latent-variable PCFGs: algorithms and sample complexity , 2014, J. Mach. Learn. Res..


[13]  Nan Jiang,et al.  Spectral Learning of Predictive State Representations with Insufficient Statistics , 2015, AAAI.

[14]  Amaury Habrard,et al.  Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning , 2013, J. Mach. Learn. Res..