Batch mode active learning for mitotic phenotypes using conformal prediction

Machine learning models are now ubiquitous in all areas of data analysis. As the amount of data generated continues to increase exponentially, the task of annotating sufficient objects with known labels by an expert remains expensive. To mitigate this, active learning approaches attempt to identify those objects whose labels will be most informative. Here, we introduce a batch-based active learning framework in a pooled setting based around conformal predictors. We select objects to add to the labelled observations based on perceived novelty, while mitigating the risks of selecting highly correlated or outlying observations. We compare our approach to classical methods using an example UCI dataset, and demonstrate its application to a pharmaceutically relevant cellular imaging problem for classifying mitotic phenotypes. Our approach facilitates efficient discovery of rare and novel classes within large screening datasets.

[1]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[2]  Ilia Nouretdinov Reverse Conformal Approach for On-line Experimental Design , 2017, COPA.

[3]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[4]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[5]  Ralf Peeters,et al.  Conformity-Based Transfer AdaBoost Algorithm , 2013, AIAI.

[6]  Devin P Sullivan,et al.  Active machine learning-driven experimentation to determine compound effects on protein patterns , 2016, eLife.

[7]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[8]  Robert Clarke,et al.  Dynamic modelling of oestrogen signalling and cell fate in breast cancer cells , 2011, Nature Reviews Cancer.

[9]  Vladimir Vovk Conformal Prediction for Reliable Machine Learning , 2014 .

[10]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[11]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[12]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[13]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[14]  Denis J. Dean,et al.  Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[15]  Hai Wei,et al.  Efficient batch-mode active learning of random forest , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[16]  Hugo Larochelle,et al.  Meta-Learning for Batch Mode Active Learning , 2018, ICLR.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Stephen S. Taylor,et al.  Cancer cells display profound intra- and interline variation following prolonged exposure to antimitotic drugs. , 2008, Cancer cell.

[19]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[20]  Kenneth E. Barner,et al.  Inductive conformal predictor for convolutional neural networks: Applications to active learning for image classification , 2019, Pattern Recognit..