Contrastive Learning Using Spectral Methods

In many natural settings, the analysis goal is not to characterize a single data set in isolation, but rather to understand the difference between one set of observations and another. For example, given a background corpus of news articles together with writings of a particular author, one may want a topic model that explains word patterns and themes specific to the author. Another example comes from genomics, in which biological signals may be collected from different regions of a genome, and one wants a model that captures the differential statistics observed in these regions. This paper formalizes this notion of contrastive learning for mixture models, and develops spectral algorithms for inferring mixture components specific to a foreground data set when contrasted with a background data set. The method builds on recent moment-based estimators and tensor decompositions for latent variable models, and has the intuitive feature of using background data statistics to appropriately modify moments estimated from foreground data. A key advantage of the method is that the background data need only be coarsely modeled, which is important when the background is too complex, noisy, or not of interest. The method is demonstrated on applications in contrastive topic modeling and genomic sequence analysis.

[1]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[2]  Sham M. Kakade,et al.  Identifiability and Unmixing of Latent Parse Trees , 2012, NIPS.

[3]  Leonard M. Adleman,et al.  Proof of proposition 3 , 1992 .

[4]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[5]  Ryan P. Adams,et al.  Priors for Diversity in Generative Latent Variable Models , 2012, NIPS.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[8]  Percy Liang,et al.  Spectral Experts for Estimating Mixtures of Linear Regressions , 2013, ICML.

[9]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[10]  Ariadna Quattoni,et al.  Local Loss Optimization in Operator Models: A New Insight into Spectral Learning , 2012, ICML.

[11]  L. Lathauwer,et al.  On the Best Rank-1 and Rank-( , 2004 .

[12]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[13]  Karl Stratos,et al.  Experiments with Spectral Learning of Latent-Variable PCFGs , 2013, HLT-NAACL.

[14]  Lies Boelen,et al.  Signal analysis for genome-wide maps of histone modifications measured by ChIP-seq , 2012, Bioinform..

[15]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[16]  Aryeh Kontorovich,et al.  On learning parametric-output HMMs , 2013, ICML.

[17]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[18]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[19]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[20]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[21]  L MercerRobert,et al.  Class-based n-gram models of natural language , 1992 .

[22]  Bradley E. Bernstein,et al.  Genome-wide Chromatin State Transitions Associated with Developmental and Environmental Cues , 2013, Cell.