Knowledge Transfer with Low-Quality Data: A Feature Extraction Issue

Effectively utilizing readily available auxiliary data to improve predictive performance on new modeling tasks is a key problem in data mining. In this research, the goal is to transfer knowledge between sources of data, particularly when ground-truth information for the new modeling task is scarce or is expensive to collect where leveraging any auxiliary sources of data becomes a necessity. Toward seamless knowledge transfer among tasks, effective representation of the data is a critical but yet not fully explored research area for the data engineer and data miner. Here, we present a technique based on the idea of sparse coding, which essentially attempts to find an embedding for the data by assigning feature values based on subspace cluster membership. We modify the idea of sparse coding by focusing the identification of shared clusters between data when source and target data may have different distributions. In our paper, we point out cases where a direct application of sparse coding will lead to a failure of knowledge transfer. We then present the details of our extension to sparse coding, by incorporating distribution distance estimates for the embedded data, and show that the proposed algorithm can overcome the shortcomings of the sparse coding algorithm on synthetic data and achieve improved predictive performance on a real world chemical toxicity transfer learning task.

[1]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[2]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[3]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[4]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[5]  Fernando Pérez-Cruz,et al.  Estimation of Information Theoretic Measures for Continuous Random Variables , 2008, NIPS.

[6]  David B. Dunson,et al.  The matrix stick-breaking process for flexible multi-task learning , 2007, ICML '07.

[7]  Qiang Yang,et al.  EigenTransfer: a unified framework for transfer learning , 2009, ICML '09.

[8]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[9]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[10]  Deepak S. Turaga,et al.  Cross domain distribution adaptation via kernel mapping , 2009, KDD.

[11]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[12]  Jun Huan,et al.  Knowledge Transfer with Low-Quality Data: A Feature Extraction Issue , 2012, IEEE Trans. Knowl. Data Eng..

[13]  Qiang Yang,et al.  Spectral domain-transfer learning , 2008, KDD.

[14]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[15]  Jing Peng,et al.  Latent space domain transfer between high dimensional overlapping distributions , 2009, WWW '09.

[16]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[17]  Aaron Smalter Hall Genome-wide Protein-chemical Interaction Prediction , 2011 .

[18]  Motoaki Kawanabe,et al.  Asymptotic Bayesian generalization error when training and test distributions are different , 2007, ICML '07.

[19]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[20]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[21]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[22]  David M. Reif,et al.  In Vitro Screening of Environmental Chemicals for Targeted Testing Prioritization: The ToxCast Project , 2009, Environmental health perspectives.

[23]  Steffen Bickel,et al.  Transfer Learning by Distribution Matching for Targeted Advertising , 2008, NIPS.

[24]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[25]  Alex Acero,et al.  Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..

[26]  Sunita Sarawagi,et al.  Domain Adaptation of Conditional Probability Models Via Feature Subsetting , 2007, PKDD.

[27]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[28]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[29]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[30]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[31]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  David M. Bradley,et al.  Differentiable Sparse Coding , 2008, NIPS.

[33]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[34]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[35]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[36]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[37]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[38]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.