ISaaC: Identifying Structural Relations in Biological Data with Copula-Based Kernel Dependency Measures

The goal of this paper is to develop a novel statistical framework for inferring dependence between distributions of variables in omics data. We propose the concept of building a dependence network using a copula-based kernel dependency measures to reconstruct the underlying association network between the distributions. ISaaC is utilized for reverse-engineering gene regulatory networks and is competitive with several state-of-the-art gene regulatory inferrence methods on DREAM3 and DREAM4 Challenge datasets. An open-source implementation of ISaaC is available at https://bitbucket.org/HossamAlmeer/isaac/.

[1]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[2]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[3]  M. Reinders,et al.  Genetic network modeling. , 2002, Pharmacogenomics.

[4]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[5]  Jonathan E. Clark,et al.  Co-expression network analysis identifies Spleen Tyrosine Kinase (SYK) as a candidate oncogenic driver in a subset of small-cell lung cancer , 2013, BMC Systems Biology.

[6]  Barnabás Póczos,et al.  Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs , 2010, NIPS.

[7]  H. Noushmehr,et al.  RGBM: regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes , 2018, Nucleic acids research.

[8]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[9]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[10]  R. Fortet,et al.  Convergence de la répartition empirique vers la répartition théorique , 1953 .

[11]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[12]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[13]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[14]  Patrick J. Paddison,et al.  Causal Mechanistic Regulatory Network for Glioblastoma Deciphered Using Systems Genetics Network Analysis. , 2016, Cell systems.

[15]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[16]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[17]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[18]  Pei Wang,et al.  Integrative random forest for gene regulatory network inference , 2015, Bioinform..

[19]  Alexander J. Smola,et al.  The kernel mutual information , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[20]  Guy Karlebach,et al.  Modelling and analysis of gene regulatory networks , 2008, Nature Reviews Molecular Cell Biology.

[21]  Boleslaw K. Szymanski,et al.  Some Properties of the Gaussian Kernel for One Class Learning , 2007, ICANN.

[22]  Kevin Y. Yip,et al.  Improved Reconstruction of In Silico Gene Regulatory Networks by Integrating Knockout and Perturbation Data , 2010, PloS one.

[23]  Johan A. K. Suykens,et al.  Representative subsets for big data learning using k-NN graphs , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[24]  Harold S. Blackman,et al.  Test Reliability and Homogeneity From the Perspective of the Ordinal Test Theory , 1988 .

[25]  B. Schweizer,et al.  On Nonparametric Measures of Dependence for Random Variables , 1981 .

[26]  E. Oja,et al.  Independent Component Analysis , 2013 .

[27]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[28]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[29]  R. Nelsen An Introduction to Copulas , 1998 .

[30]  Tomasz Arodz,et al.  ENNET: inferring large gene regulatory networks from expression data using gradient boosting , 2013, BMC Systems Biology.

[31]  A. Rényi On Measures of Entropy and Information , 1961 .

[32]  A. G. de la Fuente,et al.  From Knockouts to Networks: Establishing Direct Cause-Effect Relationships through Graph Analysis , 2010, PloS one.

[33]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[34]  Bernhard Schölkopf,et al.  A kernel-based causal learning algorithm , 2007, ICML '07.

[35]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[36]  Johan A. K. Suykens,et al.  Very Sparse LSSVM Reductions for Large-Scale Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.