Structured Sparse Boosting for Graph Classification

Boosting is a highly effective algorithm that produces a linear combination of weak classifiers (a.k.a. base learners) to obtain high-quality classification models. In this article, we propose a generalized logit boost algorithm in which base learners have structural relationships in the functional space. Although such relationships are generic, our work is particularly motivated by the emerging topic of pattern-based classification for semistructured data including graphs. Toward an efficient incorporation of the structure information, we have designed a general model in which we use an undirected graph to capture the relationship of subgraph-based base learners. In our method, we employ both L1 and Laplacian-based L2 regularization to logit boosting to achieve model sparsity and smoothness in the functional space spanned by the base learners. We have derived efficient optimization algorithms based on coordinate descent for the new boosting formulation and theoretically prove that it exhibits a natural grouping effect for nearby spatial or overlapping base learners and that the resulting estimator is consistent. Additionally, motivated by the connection between logit boosting and logistic regression, we extend our structured sparse regularization framework to logistic regression for vectorial data in which features are structured. Using comprehensive experimental study and comparing our work with the state-of-the-art, we have demonstrated the effectiveness of the proposed learning method.

[1]  Karsten M. Borgwardt,et al.  The graphlet spectrum , 2009, ICML '09.

[2]  Philip S. Yu,et al.  Dual active feature and sample selection for graph classification , 2011, KDD.

[3]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[4]  Philip S. Yu,et al.  Near-optimal Supervised Feature Selection among Frequent Subgraphs , 2009, SDM.

[5]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[6]  Sebastian Nowozin,et al.  Weighted Substructure Mining for Image Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  John Blitzer,et al.  Regularized Learning with Networks of Features , 2008, NIPS.

[8]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[9]  George Karypis,et al.  Frequent Substructure-Based Approaches for Classifying Chemical Compounds , 2005, IEEE Trans. Knowl. Data Eng..

[10]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[11]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[13]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[14]  Samah Jamal Fodeh,et al.  A Probabilistic Substructure-Based Approach for Graph Classification , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[15]  Philip S. Yu,et al.  Semi-supervised feature selection for graph classification , 2010, KDD.

[16]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[17]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[18]  Fei-Fei Li,et al.  Voxel-level functional connectivity using spatial regularization , 2012, NeuroImage.

[19]  Yuji Matsumoto,et al.  An Application of Boosting to Graph Classification , 2004, NIPS.

[20]  Peter L. Bartlett,et al.  AdaBoost is Consistent , 2006, J. Mach. Learn. Res..

[21]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[22]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[23]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[24]  Nicole Krämer,et al.  Partial least squares regression for graph mining , 2008, KDD.

[25]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[26]  Gholamreza Haffari,et al.  Boosting with incomplete information , 2008, ICML '08.

[27]  Rocco A. Servedio,et al.  Random classification noise defeats all convex potential boosters , 2008, ICML '08.

[28]  H. Zou,et al.  The F ∞ -norm support vector machine , 2008 .

[29]  G. Karypis,et al.  Frequent sub-structure-based approaches for classifying chemical compounds , 2005, Third IEEE International Conference on Data Mining.

[30]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  Anirban Bhaduri,et al.  Conserved spatially interacting motifs of protein superfamilies: Application to fold recognition and function annotation of genome data , 2004, Proteins.

[33]  Ping Li Adaptive Base Class Boost for Multi-class Classification , 2008, ArXiv.

[34]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[35]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[36]  Chao Liu,et al.  Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs , 2005, SDM.

[37]  Wei Wang,et al.  Graph classification based on pattern co-occurrence , 2009, CIKM.

[38]  Rong Yan,et al.  Model-shared subspace boosting for multi-label classification , 2007, KDD '07.

[39]  Hongliang Fei,et al.  Structure feature selection for graph classification , 2008, CIKM '08.

[40]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[41]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[42]  Yi Lu,et al.  MCM-test: a fuzzy-set-theory-based approach to differential analysis of gene pathways , 2008, BMC Bioinformatics.

[43]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[44]  Lei Zheng,et al.  Information theoretic regularization for semi-supervised boosting , 2009, KDD.

[45]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[46]  Sebastian Nowozin,et al.  gBoost: a mathematical programming approach to graph classification and regression , 2009, Machine Learning.

[47]  Ping Li,et al.  ABC-boost: adaptive base class boost for multi-class classification , 2008, ICML '09.

[48]  Sanjay Chawla,et al.  Association Rules Network: Definition and Applications , 2009, Stat. Anal. Data Min..

[49]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[50]  Yoram Singer,et al.  Boosting with structural sparsity , 2009, ICML '09.

[51]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[52]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[53]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[54]  Hongliang Fei,et al.  L2 norm regularized feature kernel regression for graph data , 2009, CIKM.

[55]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[56]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[57]  Koji Tsuda,et al.  Entire regularization paths for graph data , 2007, ICML '07.

[58]  Peter J. Ramadge,et al.  Boosting with Spatial Regularization , 2009, NIPS.

[59]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[60]  Philip S. Yu,et al.  Positive and Unlabeled Learning for Graph Classification , 2011, 2011 IEEE 11th International Conference on Data Mining.

[61]  Wei Wang,et al.  LTS: Discriminative subgraph mining by learning from search history , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[62]  Yuchun Guo,et al.  High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints , 2012, PLoS Comput. Biol..