Interactive Exploration of Multitask Dependency Networks

Scientists increasingly depend on machine learning algorithms to discover patterns in complex data. Two examples addressed in this dissertation are identifying how information sharing among regions of the brain develops due to learning; and, learning dependency networks of blood proteins associated with cancer. Dependency networks, or graphical models, are learned from the observed data in order to make comparisons between the sub-populations of the dataset. Rarely is there su cient data to infer robust individual networks for each sub-population. The multiple networks must be considered simultaneously; exploding the hypothesis space of the learning problem. Exploring this complex solution space requires input from the domain scientist to refine the objective function. This dissertation introduces a framework to incorporate domain knowledge in transfer learning to facilitate the exploration of solutions. The framework is a generalization of existing algorithms for multiple network structure identification. Solutions produced with human input narrow down the variance of solutions to those that answer questions of interest to domain scientists. Patterns, such as identifying di↵erences between networks, are learned with higher confidence using transfer learning than through the standard method of bootstrapping. Transfer learning may be the

[1]  M. Bosner,et al.  Cholesterol transport function of pancreatic cholesterol esterase: directed sterol uptake and esterification in enterocytes. , 1993, Biochemistry.

[2]  Gunnar Rätsch,et al.  Leveraging Sequence Classification by Taxonomy-Based Multitask Learning , 2010, RECOMB.

[3]  Carla E. Brodley,et al.  Visualization and interactive feature selection for unsupervised data , 2000, KDD '00.

[4]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[7]  Andrew McCallum,et al.  Semi-Supervised Clustering with User Feedback , 2003 .

[8]  Yan Liu,et al.  Temporal Graphical Models for Cross-Species Gene Regulatory Network Discovery , 2011, J. Bioinform. Comput. Biol..

[9]  Qiang Yang,et al.  Transferring Multi-device Localization Models using Latent Multi-task Learning , 2008, AAAI.

[10]  Margaret Werner-Washburne,et al.  A multiple network learning approach to capture system-wide condition-specific responses , 2011, Bioinform..

[11]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[12]  Desney S. Tan,et al.  Performance and Preferences: Interactive Refinement of Machine Learning Procedures , 2012, AAAI.

[13]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[14]  Mark W. Woolrich,et al.  Multiple-subjects connectivity-based parcellation using hierarchical Dirichlet process mixture models , 2009, NeuroImage.

[15]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[16]  Yaser S. Abu-Mostafa,et al.  Hints , 2018, Neural Computation.

[17]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[18]  Dirk Husmeier,et al.  Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks , 2010, NIPS.

[19]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[20]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[21]  Marie desJardins,et al.  Interactive visual clustering , 2007, IUI '07.

[22]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[23]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[24]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[25]  Eric Eaton,et al.  Interactive Learning Using Manifold Geometry , 2010, AAAI Fall Symposium: Manifold Learning and Its Applications.

[26]  Russell Greiner,et al.  Model Selection Criteria for Learning Belief Nets: An Empirical Comparison , 2000, ICML.

[27]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[28]  Marco Grzegorczyk,et al.  Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move , 2008, Machine Learning.

[29]  Mikko Koivisto,et al.  Advances in Exact Bayesian Structure Discovery in Bayesian Networks , 2006, UAI.

[30]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[31]  Nir Friedman,et al.  On the Sample Complexity of Learning Bayesian Networks , 1996, UAI.

[32]  Ian H. Witten,et al.  Interactive machine learning: letting users build classifiers , 2002, Int. J. Hum. Comput. Stud..

[33]  Tracy R. Keeney,et al.  Aptamer-based multiplexed proteomic technology for biomarker discovery , 2010, Nature Precedings.

[34]  Quanquan Gu,et al.  Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[35]  Koby Crammer,et al.  Multi-domain learning by confidence-weighted parameter combination , 2010, Machine Learning.

[36]  Mikko Koivisto,et al.  Exact Structure Discovery in Bayesian Networks with Less Space , 2009, UAI.

[37]  Daphne Koller,et al.  Active Learning for Structure in Bayesian Networks , 2001, IJCAI.

[38]  Leslie G. Ungerleider,et al.  Object vision and spatial vision: two cortical pathways , 1983, Trends in Neurosciences.

[39]  Emma Steele,et al.  Selecting and Weighting Data for Building Consensus Gene Regulatory Networks , 2009, IDA.

[40]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[41]  Gregory F. Cooper,et al.  Causal Discovery from a Mixture of Experimental and Observational Data , 1999, UAI.

[42]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[43]  Christophe Ambroise,et al.  Inferring multiple graphical structures , 2009, Stat. Comput..

[44]  S. Hirohashi,et al.  Cell adhesion system and human cancer morphogenesis , 2003, Cancer science.

[45]  Terran Lane,et al.  Leveraging Domain Knowledge in Multitask Bayesian Network Structure Learning , 2012, AAAI.

[46]  Desney S. Tan,et al.  Examining multiple potential models in end-user interactive concept learning , 2010, CHI.

[47]  Karl J. Friston,et al.  Statistical parametric maps in functional imaging: A general linear approach , 1994 .

[48]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[49]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[50]  Mark W. Woolrich,et al.  Network modelling methods for FMRI , 2011, NeuroImage.

[51]  Rich Caruana,et al.  Inductive Transfer for Bayesian Network Structure Learning , 2007, ICML Unsupervised and Transfer Learning.

[52]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[53]  E. Levina,et al.  Joint estimation of multiple graphical models. , 2011, Biometrika.

[54]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[55]  Dirk Husmeier,et al.  Heterogeneous Continuous Dynamic Bayesian Networks with Flexible Structure and Inter-Time Segment Information Sharing , 2010, ICML.

[56]  Desney S. Tan,et al.  Effective End-User Interaction with Machine Learning , 2011, AAAI.

[57]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[58]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[59]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[60]  Jianwen Zhang,et al.  Multitask Bregman clustering , 2010, Neurocomputing.

[61]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[62]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[63]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[64]  Luis Enrique Sucar,et al.  Inductive transfer for learning Bayesian networks , 2010, Machine Learning.

[65]  Dimitris Samaras,et al.  Multi-Task Learning of Gaussian Graphical Models , 2010, ICML.

[66]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[67]  Ashish Verma,et al.  Cross-Guided Clustering: Transfer of Relevant Supervision across Domains for Improved Clustering , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[68]  S. Bergmann,et al.  Similarities and Differences in Genome-Wide Expression Data of Six Organisms , 2003, PLoS biology.

[69]  Terran Lane,et al.  Learning class-discriminative dynamic Bayesian networks , 2005, ICML.

[70]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[71]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[72]  Desney S. Tan,et al.  CueFlik: interactive concept learning in image search , 2008, CHI.

[73]  James Allan,et al.  Interactive Clustering of Text Collections According to a User-Specified Criterion , 2007, IJCAI.

[74]  Indrajit Bhattacharya,et al.  A Cluster-Level Semi-supervision Model for Interactive Clustering , 2010, ECML/PKDD.

[75]  Mikko Koivisto,et al.  Partial Order MCMC for Structure Discovery in Bayesian Networks , 2011, UAI.

[76]  Vincent Ng,et al.  Single Data , Multiple Clusterings , 2009 .

[77]  Kevin Murphy,et al.  Active Learning of Causal Bayes Net Structure , 2006 .

[78]  Seungyeop Han,et al.  Structured Learning of Gaussian Graphical Models , 2012, NIPS.

[79]  Jean-Baptiste Poline,et al.  Brain covariance selection: better individual functional connectivity models using population prior , 2010, NIPS.

[80]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[81]  Zheng Chen,et al.  Transfer learning for behavioral targeting , 2010, WWW '10.

[82]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[83]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[84]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[85]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[86]  Elias Bareinboim,et al.  Transportability of Causal and Statistical Relations: A Formal Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[87]  Eric Eaton,et al.  Modeling Transfer Relationships Between Learning Tasks for Improved Inductive Transfer , 2008, ECML/PKDD.

[88]  Vince D. Calhoun,et al.  TDCS guided using fMRI significantly accelerates learning to identify concealed objects , 2012, NeuroImage.

[89]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[90]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[91]  Karl J. Friston,et al.  Dynamic causal modelling , 2003, NeuroImage.

[92]  Fang Han,et al.  Transelliptical Graphical Models , 2012, NIPS.

[93]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[94]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[95]  G. DeJong,et al.  Generative Prior Knowledge for Discriminative Classification , 2006, J. Artif. Intell. Res..

[96]  Gordon E. Sarty,et al.  Computing brain activity maps from fMRI time-series images , 2006 .

[97]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[98]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[99]  Volker Tresp,et al.  Robust multi-task learning with t-processes , 2007, ICML '07.

[100]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[101]  J L Lancaster,et al.  Automated Talairach Atlas labels for functional brain mapping , 2000, Human brain mapping.

[102]  D. Husmeier,et al.  Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge , 2007, Statistical applications in genetics and molecular biology.

[103]  Gabriele Lohmann,et al.  Learning partially directed functional networks from meta-analysis imaging data , 2010, NeuroImage.

[104]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[105]  Ralph S Freedman,et al.  Ovarian cancer, the coagulation pathway, and inflammation , 2005, Journal of Translational Medicine.