Stacking of Network Based Classifiers with Application in Breast Cancer Classification

In this study we present the use of existing biological knowledge in the form of biological networks for the construction of a two level classification scheme. At the first level base classifiers are built using a given list of candidate “biomarkers” and the topology of the biological network. In particular, the network structure is taken into account by a search strategy based on random walks for the selection of the genes used in these classifiers. At the second level, a meta-classifier is trained to combine in the best possible way the results of the base classifiers. The proposed approach therefore aims to strengthen the classification ability of the initial list of genes and provide more robust generalization guarantees. Our methodology is explained in full detail and promising results in Breast Cancer related scenarios are presented.

[1]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[2]  Michalis E. Zervakis,et al.  On the Identification of Circulating Tumor Cells in Breast Cancer , 2014, IEEE Journal of Biomedical and Health Informatics.

[3]  A. Barabasi,et al.  Interactome Networks and Human Disease , 2011, Cell.

[4]  C. Sotiriou,et al.  Taking gene-expression profiling to the clinic: when will molecular signatures become relevant to patient care? , 2007, Nature Reviews Cancer.

[5]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[6]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[7]  Ivan Merelli,et al.  A multilevel data integration resource for breast cancer study , 2010, BMC Systems Biology.

[8]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[9]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[10]  M. Rosenlicht Introduction to Analysis , 1970 .

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[12]  Fan Chung,et al.  The heat kernel as the pagerank of a graph , 2007, Proceedings of the National Academy of Sciences.

[13]  Haiyuan Yu,et al.  HINT: High-quality protein interactomes and their applications in understanding human disease , 2012, BMC Systems Biology.

[14]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[15]  D. Koller,et al.  From signatures to models: understanding cancer using microarrays , 2005, Nature Genetics.

[16]  Ambuj K. Singh,et al.  Analysis of protein-protein interaction networks using random walks , 2005, BIOKDD.

[17]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Li Wang,et al.  Integrating Multi-Omics for Uncovering the Architecture of Cross-Talking Pathways in Breast Cancer , 2014, PloS one.

[20]  L. Esserman,et al.  Expression profiling of circulating tumor cells in metastatic breast cancer , 2014, Breast Cancer Research and Treatment.

[21]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[22]  Mingguang Shi,et al.  A Network-Based Gene Expression Signature Informs Prognosis and Treatment for Colorectal Cancer Patients , 2012, PloS one.

[23]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[26]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..