Using dependencies to pair samples for multi-view learning

Several data analysis tools such as (kernel) canonical correlation analysis and various multi-view learning methods require paired observations in two data sets. We study the problem of inferring such pairing for data sets with no known one-to-one pairing. The pairing is found by an iterative algorithm that alternates between searching for feature representations that reveal statistical dependencies between the data sets, and finding the best pairs for the samples. The method is applied on pairing probe sets of two different microarray platforms.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  J. Downing,et al.  Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. , 2003, Blood.

[3]  Matti Koskimies Applying model checking to analysing safety instrumented systems , 2008 .

[4]  S. Kaski,et al.  Generative Models that Discover Dependencies Between Data Sets , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[5]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[6]  Samuel Kaski,et al.  Non-parametric dependent components , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Tapani Raiko,et al.  Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values , 2022 .

[8]  Keijo Heljanko,et al.  Interface specification methods for software components , 2009 .

[9]  Steffen Bickel,et al.  Estimation of Mixture Models Using Co-EM , 2005, ECML.

[10]  Jorma Laaksonen,et al.  Tkk Reports in Information and Computer Science Techniques for Image Classification, Object Detection and Object Segmentation Tkk Reports in Information and Computer Science Techniques for Image Classification, Object Detection and Object Segmentation , 2022 .

[11]  Jussi Lahtinen Model checking timed safety instrumented systems , 2008 .

[12]  Samuel Kaski,et al.  Bayesian Solutions to the Label Switching Problem , 2009, IDA.

[13]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[14]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[15]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[16]  John Shawe-Taylor,et al.  Using KCCA for Japanese–English cross-language information retrieval and document classification , 2006, Journal of Intelligent Information Systems.