Towards data-free gating of heterogeneous pre-trained neural networks

The combination and aggregation of knowledge from multiple neural networks can be commonly seen in the form of mixtures of experts. However, such combinations are usually done using networks trained on the same tasks, with little mention of the combination of heterogeneous pre-trained networks, especially in the data-free regime. The problem of combining pre-trained models in the absence of relevant datasets is likely to become increasingly important, as machine learning continues to dominate the AI landscape, and the number of useful but specialized models explodes. This paper proposes multiple data-free methods for the combination of heterogeneous neural networks, ranging from the utilization of simple output logit statistics, to training specialized gating networks. The gating networks decide whether specific inputs belong to specific networks based on the nature of the expert activations generated. The experiments revealed that the gating networks, including the universal gating approach, constituted the most accurate approach, and therefore represent a pragmatic step towards applications with heterogeneous mixtures of experts in a data-free regime. The code for this project is hosted on github at https://github.com/cwkang1998/network-merging .

[1]  Moacir P. Ponti,et al.  Combining Classifiers: From the Creation of Ensembles to the Decision Fusion , 2011, SIBGRAPI Tutorials.

[2]  G. Brian Thompson,et al.  Sex Differences in Reading Attainments , 1975 .

[3]  Walter Karlen,et al.  Granger-Causal Attentive Mixtures of Experts: Learning Important Features with Neural Networks , 2018, AAAI.

[4]  Seda Sahin,et al.  Hybrid expert systems: A survey of current approaches and applications , 2012, Expert Syst. Appl..

[5]  Esko Juuso,et al.  Integration of intelligent systems in development of smart adaptive systems , 2004, Int. J. Approx. Reason..

[6]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[8]  Andrey Gavrilov,et al.  Hybrid Rule and Neural Network Based Framework for Ubiquitous Computing , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[9]  Yue He,et al.  Multi-expert opinions combination based on evidence theory , 2007 .

[10]  Emilio Soria Olivas,et al.  Handbook of Research on Machine Learning Applications and Trends : Algorithms , Methods , and Techniques , 2009 .

[11]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[12]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[13]  Franz Pernkopf,et al.  Acoustic Scene Classification with Mismatched Recording Devices Using Mixture of Experts Layer , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[14]  Vasile Palade,et al.  Neural and Neuro-Fuzzy Integration in a Knowledge-Based System for Air Quality Prediction , 2002, Applied Intelligence.

[15]  Mark J. van der Laan,et al.  The relative performance of ensemble methods with deep convolutional neural networks for image classification , 2017, Journal of applied statistics.

[16]  Regina Barzilay,et al.  Multi-Source Domain Adaptation with Mixture of Experts , 2018, EMNLP.

[17]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[18]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[19]  Trevor Darrell,et al.  Deep Mixture of Experts via Shallow Embedding , 2018, UAI.

[20]  Amr Tolba,et al.  Automatic detection of lung cancer from biomedical data set using discrete AdaBoost optimized ensemble learning generalized neural networks , 2019, Neural Computing and Applications.

[21]  Faicel Chamroukhi,et al.  Practical and theoretical aspects of mixture‐of‐experts modeling: An overview , 2018, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[22]  Reza Ebrahimpour,et al.  Mixture of experts: a literature survey , 2014, Artificial Intelligence Review.

[23]  Shunta Maeda Fast and Flexible Image Blind Denoising via Competition of Experts , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[25]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Faicel Chamroukhi,et al.  Robust mixture of experts modeling using the t distribution , 2016, Neural Networks.

[27]  K. Nabeshima,et al.  Nuclear reactor monitoring with the combination of neural network and expert system , 2002, Math. Comput. Simul..

[28]  D. Perkins,et al.  Are Cognitive Skills Context-Bound? , 1989 .

[29]  Xia Hong,et al.  A Mixture of Experts Network Structure Construction Algorithm for Modelling and Control , 2001, Applied Intelligence.

[30]  Ramesh Raskar,et al.  ExpertMatcher: Automating ML Model Selection for Users in Resource Constrained Countries , 2019, ArXiv.

[31]  Rodrigo Minetto,et al.  Hydra: An Ensemble of Convolutional Neural Networks for Geospatial Land Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[33]  Christian Desrosiers,et al.  Att-MoE: Attention-based Mixture of Experts for nuclear and cytoplasmic segmentation , 2020, Neurocomputing.

[34]  Dacheng Tao,et al.  MoE-SPNet: A Mixture-of-Experts Scene Parsing Network , 2018, Pattern Recognit..