Data fusion and matching by maximizing statistical dependencies

Multi-view learning is a task of learning from multiple data sources where each source represents a different view of the same phenomenon. Typical examples include multimodal information retrieval and classification of genes by combining heterogeneous genomic data. Multi-view learning methods can be motivated by two interrelated lines of thoughts: if single view is not sufficient for the learning task, other views can complement the information. Secondly, learning by searching for an agreement between views may generalize better than learning from a single view. In this thesis, novel methods for unsupervised multi-view learning are proposed. Multi-view learning methods, in general, work by searching for an agreement between views. However, defining an agreement is not straightforward in an unsupervised learning task. In this thesis, statistical dependency is used to define an agreement between the views. Assuming that the shared information between the views is more interesting, statistical dependency is used to find the shared information. Based on this principle, a fast linear preprocessing method that performs data fusion during exploratory data analysis is introduced. Also, a novel evaluation approach based on the dependency between views to compare vector representations for bilingual corpora is introduced. Multi-view learning methods in general assume co-occurred samples for the

[1]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[2]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[3]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[4]  Chong-sun Kim Canonical Analysis of Several Sets of Variables , 1973 .

[5]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[6]  S. Kaski,et al.  Generative Models that Discover Dependencies Between Data Sets , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[9]  Chong Wang,et al.  Variational Bayesian Approach to Canonical Correlation Analysis , 2007, IEEE Transactions on Neural Networks.

[10]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[11]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[12]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[13]  Iain S. Duff,et al.  On Algorithms For Permuting Large Entries to the Diagonal of a Sparse Matrix , 2000, SIAM J. Matrix Anal. Appl..

[14]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[15]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[16]  Stelios Piperidis,et al.  Automatic Alignment in Parallel Corpora , 1994, ACL.

[17]  Thomas Hofmann,et al.  Data Integration for Classification Problems Employing Gaussian Process Priors , 2007 .

[18]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[19]  Peter J. Park,et al.  Combining gene expression data from different generations of oligonucleotide arrays , 2004, BMC Bioinformatics.

[20]  Samuel Kaski,et al.  Local dependent components , 2007, ICML '07.

[21]  Samuel Kaski,et al.  Variational Bayesian Mixture of Robust CCA Models , 2010, ECML/PKDD.

[22]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[23]  Dekai Wu,et al.  Aligning a Parallel English-Chinese Corpus Statistically With Lexical Criteria , 1994, ACL.

[24]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[25]  Kurt Hornik,et al.  A CLUE for CLUster Ensembles , 2005 .

[26]  Samuel Kaski,et al.  Exploratory modeling of yeast stress response and its regulation with gcca and associative clustering , 2005, Int. J. Neural Syst..

[27]  B. Silverman,et al.  Canonical correlation analysis when the data are curves. , 1993 .

[28]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[29]  B. Moor,et al.  On the Regularization of Canonical Correlation Analysis , 2003 .

[30]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[31]  Z. Szallasi,et al.  Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements. , 2004, Nucleic acids research.

[32]  Olli Simell,et al.  Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes , 2008, The Journal of experimental medicine.

[33]  Fabrizio Sebastiani,et al.  Distributional term representations: an experimental comparison , 2004, CIKM '04.

[34]  Jr. Robert J. Kennedy,et al.  Solving unweighted and weighted bipartite matching problems in theory and practice , 1996 .

[35]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[36]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[37]  J. Downing,et al.  Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. , 2003, Blood.

[38]  David R. Hardoon,et al.  LEARNING THE SEMANTICS OF MULTIMEDIA CONTENT WITH APPLICATION TO WEB IMAGE RETRIEVAL AND CLASSIFICATION , 2003 .

[39]  Magnus Sahlgren Towards pertinent evaluation methodologies for word-space models , 2006, LREC.

[40]  Paolo Toth,et al.  Algorithms and codes for dense assignment problems: the state of the art , 2000, Discret. Appl. Math..

[41]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[42]  Zoltan Szallasi,et al.  Increased measurement accuracy for sequence-verified microarray probes. , 2004, Physiological genomics.

[43]  Kenji Fukumizu,et al.  Statistical Consistency of Kernel Canonical Correlation Analysis , 2007 .

[44]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[45]  A. Volgenant,et al.  A shortest augmenting path algorithm for dense and sparse linear assignment problems , 1987, Computing.

[46]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[47]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[48]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[49]  Olli Simell,et al.  Gender-dependent progression of systemic metabolic states in early childhood , 2008, Molecular systems biology.

[50]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[51]  Zoubin Ghahramani,et al.  Probabilistic models for data combination in recommender systems , 2008 .

[52]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[53]  Doris Damian,et al.  Applications of a new subspace clustering algorithm (COSA) in medical systems biology , 2007, Metabolomics.

[54]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[55]  Preslav Nakov,et al.  Weight functions impact on LSA performance , 2001 .

[56]  Robert L. Mercer,et al.  Aligning Sentences in Parallel Corpora , 1991, ACL.

[57]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[58]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[59]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[60]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[61]  I. Dan Melamed,et al.  Bitext Maps and Alignment via Pattern Recognition , 1999, CL.

[62]  Daoqiang Zhang,et al.  A New Canonical Correlation Analysis Algorithm with Local Discrimination , 2010, Neural Processing Letters.

[63]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[64]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[65]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[66]  Michel Verleysen,et al.  Robust probabilistic projections , 2006, ICML.

[67]  Samuel Kaski,et al.  Probabilistic approach to detecting dependencies between data sets , 2008, Neurocomputing.

[68]  Malte Kuss,et al.  The Geometry Of Kernel Canonical Correlation Analysis , 2003 .

[69]  H. Akaike A new look at the statistical model identification , 1974 .

[70]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[71]  Le Song,et al.  Kernelized Sorting , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  John Shawe-Taylor,et al.  Using KCCA for Japanese–English cross-language information retrieval and document classification , 2006, Journal of Intelligent Information Systems.

[73]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[74]  Kitsuchart Pasupa,et al.  Learning to rank images from eye movements , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[75]  Tony Jebara,et al.  Kernelizing Sorting, Permutation, and Alignment for Minimum Volume PCA , 2004, COLT.

[76]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[77]  K. Pearson Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia , 1896 .

[78]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[79]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[80]  Samuel Kaski,et al.  Bayesian exponential family projections for coupled data sources , 2010, UAI.

[81]  A. Rényi On measures of dependence , 1959 .

[82]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[83]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[84]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[85]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.