Unsupervised Ensemble Classification with Dependent Data

Ensemble learning, the machine learning paradigm where multiple algorithms are combined, has exhibited promising perfomance in a variety of tasks. The present work focuses on unsupervised ensemble classification. The term unsupervised refers to the ensemble combiner who has no knowledge of the ground-truth labels that each classifier has been trained on. While most prior works on unsupervised ensemble classification are designed for independent and identically distributed (i.i.d.) data, the present work introduces an unsupervised scheme for learning from ensembles of classifiers in the presence of data dependencies. Two types of data dependencies are considered: sequential data and networked data whose dependencies are captured by a graph. Moment matching and Expectation Maximization algorithms are developed for the aforementioned cases, and their performance is evaluated on synthetic and real datasets.

[1]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[2]  Georgios B. Giannakis,et al.  Adaptive Diffusions for Scalable Learning Over Graphs , 2018, IEEE Transactions on Signal Processing.

[3]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[4]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[5]  Georgios B. Giannakis,et al.  Blind Multiclass Ensemble Classification , 2017, IEEE Transactions on Signal Processing.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  F. Wright,et al.  Multidisciplinary cancer conferences: a systematic review and development of practice standards. , 2007, European journal of cancer.

[8]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[9]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[10]  Nikos D. Sidiropoulos,et al.  A Flexible and Efficient Algorithmic Framework for Constrained Matrix and Tensor Factorization , 2015, IEEE Transactions on Signal Processing.

[11]  Yuval Kluger,et al.  Estimating the accuracies of multiple classifiers without labeled data , 2014, AISTATS.

[12]  Lise Getoor,et al.  Query-driven Active Surveying for Collective Classification , 2012 .

[13]  Aryeh Kontorovich,et al.  On learning parametric-output HMMs , 2013, ICML.

[14]  Daren C. Brabham Crowdsourcing as a Model for Problem Solving , 2008 .

[15]  Matthew Lease,et al.  Aggregating and Predicting Sequence Labels from Crowd Annotations , 2017, ACL.

[16]  Aggelos K. Katsaggelos,et al.  Learning from crowds with variational Gaussian processes , 2019, Pattern Recognit..

[17]  Panagiotis A. Traganitis Blind Ensemble Classification of Sequential Data , 2019, 2019 IEEE Data Science Workshop (DSW).

[18]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[19]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[20]  Georgios B. Giannakis,et al.  Blind Multi-class Ensemble Learning with Dependent Classifiers , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[21]  Devavrat Shah,et al.  Efficient crowdsourcing for multi-class labeling , 2013, SIGMETRICS '13.

[22]  Jun Zhang,et al.  The Mean Field Theory In EM Procedures For Markov Random Fields , 1991, Proceedings of the Seventh Workshop on Multidimensional Signal Processing.

[23]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[24]  Georgios B. Giannakis,et al.  Learning from unequally reliable blind ensembles of classifiers , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[25]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[26]  A. Timmermann Forecast Combinations , 2005 .

[27]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[28]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[29]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[30]  Nikos D. Sidiropoulos,et al.  Learning Hidden Markov Models from Pairwise Co-occurrences with Applications to Topic Modeling , 2018, ICML.

[31]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[32]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[33]  Subramanian Ramanathan,et al.  Learning from multiple annotators with varying expertise , 2013, Machine Learning.

[34]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[35]  nhnguyen,et al.  Comparisons of Sequence Labeling Algorithms and Extensions , 2007 .

[36]  Y. Kluger,et al.  Picking ChIP-seq peak detectors for analyzing chromatin modification experiments , 2012, Nucleic acids research.

[37]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[38]  Bernardete Ribeiro,et al.  Sequence labeling with multiple annotators , 2013, Machine Learning.

[39]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[40]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[41]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[42]  Bernardete Ribeiro,et al.  Gaussian Process Classification and Active Learning with Multiple Annotators , 2014, ICML.