Handling data irregularities in classification: Foundations, trends, and future challenges

Abstract Most of the traditional pattern classifiers assume their input data to be well-behaved in terms of similar underlying class distributions, balanced size of classes, the presence of a full set of observed features in all data instances, etc. Practical datasets, however, show up with various forms of irregularities that are, very often, sufficient to confuse a classifier, thus degrading its ability to learn from the data. In this article, we provide a bird’s eye view of such data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities. Subsequently, we discuss the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities. We also discuss the interrelation and co-occurrences of the data irregularities including class imbalance, small disjuncts, class skew, missing features, and absent (non-existing or undefined) features. Finally, we uncover a number of interesting future research avenues that are equally contextual with respect to the regular as well as deep machine learning paradigms.

[1]  Volker Tresp,et al.  Some Solutions to the Missing Feature Problem in Vision , 1992, NIPS.

[2]  Haym Hirsh,et al.  The Problem with Noise and Small Disjuncts , 1998, ICML.

[3]  Szymon Wilk,et al.  Learning from Imbalanced Data in Presence of Noisy and Borderline Examples , 2010, RSCTC.

[4]  Anant Madabhushi,et al.  An active learning based classification strategy for the minority class problem: application to histopathology annotation , 2011, BMC Bioinformatics.

[5]  Ye Yang,et al.  A Comparative Study of absent Features and Unobserved Values in Software Effort Data , 2012, Int. J. Softw. Eng. Knowl. Eng..

[6]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[7]  Vipul K. Dabhi,et al.  Classification of imbalanced data sets using Multi Objective Genetic Programming , 2015, 2015 International Conference on Computer Communication and Informatics (ICCCI).

[8]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[9]  Johan A. K. Suykens,et al.  Handling missing values in support vector machine classifiers , 2005, Neural Networks.

[10]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[11]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[12]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[14]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[15]  Yong Zhang,et al.  Imbalanced data classification based on scaling kernel-based support vector machine , 2014, Neural Computing and Applications.

[16]  Hong Zhang,et al.  An Under-sampling Method: Based on Principal Component Analysis and Comprehensive Evaluation Model , 2016, 2016 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C).

[17]  Bo Tang,et al.  GIR-based ensemble sampling approaches for imbalanced learning , 2017, Pattern Recognit..

[18]  Deborah R. Carvalho,et al.  A genetic-algorithm for discovering small-disjunct rules in data mining , 2002, Appl. Soft Comput..

[19]  Francisco Herrera,et al.  Class Switching according to Nearest Enemy Distance for learning from highly imbalanced data-sets , 2017, Pattern Recognit..

[20]  Iqbal Gondal,et al.  K-ranked covariance based missing values estimation for microarray data classification , 2004, Fourth International Conference on Hybrid Intelligent Systems (HIS'04).

[21]  Deborah R. Carvalho,et al.  A Genetic Algorithm-Based Solution for the Problem of Small Disjuncts , 2000, PKDD.

[22]  Robert P. W. Duin,et al.  Combining One-Class Classifiers to Classify Missing Data , 2004, Multiple Classifier Systems.

[23]  Huiqing Li,et al.  A Minimax Framework for Classification with Applications to Images and High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Xiuzhen Zhang,et al.  A Positive-biased Nearest Neighbour Algorithm for Imbalanced Classification , 2013, PAKDD.

[25]  Antal van den Bosch,et al.  When small disjuncts abound, try lazy learning: A case study , 1997 .

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[28]  Fang Bin,et al.  Incomplete Time Series Prediction Using Max-Margin Classification of Data with Absent Features , 2010 .

[29]  Bartosz Krawczyk,et al.  Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets , 2016, Pattern Recognit..

[30]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[31]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[32]  Francisco Herrera,et al.  EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling , 2013, Pattern Recognit..

[33]  Asifullah Khan,et al.  Two-phase deep convolutional neural network for reducing class skewness in histopathological images based breast cancer detection , 2017, Comput. Biol. Medicine.

[34]  Alexander J. Smola,et al.  A Second Order Cone programming Formulation for Classifying Missing Data , 2004, NIPS.

[35]  Y. Chen,et al.  Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing , 2011, Active Learning and Experimental Design @ AISTATS.

[36]  Serpil Sayin,et al.  SVM classification for imbalanced data sets using a multiobjective optimization framework , 2014, Ann. Oper. Res..

[37]  Haym Hirsh,et al.  A Quantitative Study of Small Disjuncts , 2000, AAAI/IAAI.

[38]  A. Freitas,et al.  Evaluating Six Candidate Solutions for the Small-Disjunct Problem and Choosing the Best Solution via Meta-Learning , 2005, Artificial Intelligence Review.

[39]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[40]  Gary M. Weiss The Impact of Small Disjuncts on Classifier Learning , 2010, Data Mining.

[41]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Saroj K. Biswas,et al.  Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance , 2017, Pattern Recognit. Lett..

[43]  Gustavo E. A. P. A. Batista,et al.  Learning with Skewed Class Distributions , 2002 .

[44]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[45]  Gianluca Bontempi,et al.  Learned lessons in credit card fraud detection from a practitioner perspective , 2014, Expert Syst. Appl..

[46]  Edward Y. Chang,et al.  Adaptive Feature-Space Conformal Transformation for Imbalanced-Data Learning , 2003, ICML.

[47]  Sattar Hashemi,et al.  To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques , 2016, IEEE Transactions on Knowledge and Data Engineering.

[48]  Wanthanee Prachuabsupakij,et al.  CLUS: A new hybrid sampling classification for imbalanced data , 2015, 2015 12th International Joint Conference on Computer Science and Software Engineering (JCSSE).

[49]  J. N. K. Rao,et al.  Empirical Likelihood‐based Inference in Linear Models with Missing Data , 2002 .

[50]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[51]  Jong-Seok Lee,et al.  A New Under-Sampling Method Using Genetic Algorithm for Imbalanced Data Classification , 2016, IMCOM.

[52]  Eric Bax,et al.  Validation of $k$-Nearest Neighbor Classifiers , 2012, IEEE Transactions on Information Theory.

[53]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[54]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[55]  Paola Sebastiani,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Robust Learning with Missing Data , 2022 .

[56]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[57]  Jing Zhang,et al.  Large cost-sensitive margin distribution machine for imbalanced data classification , 2017, Neurocomputing.

[58]  Qingfu Zhang,et al.  Multiobjective evolutionary algorithms: A survey of the state of the art , 2011, Swarm Evol. Comput..

[59]  Foster J. Provost,et al.  Small Disjuncts in Action: Learning to Diagnose Errors in the Local Loop of the Telephone Network , 1993, ICML.

[60]  Roi Livni,et al.  Classification with Low Rank and Missing Data , 2015, ICML.

[61]  Robi Polikar,et al.  An ensemble of classifiers approach for the missing feature problem , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[62]  Chong Peng,et al.  A Supervised Learning Model for High-Dimensional and Large-Scale Data , 2016, ACM Trans. Intell. Syst. Technol..

[63]  Stefan Wermter,et al.  Towards Effective Classification of Imbalanced Data with Convolutional Neural Networks , 2016, ANNPR.

[64]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[65]  Aníbal R. Figueiras-Vidal,et al.  Multi-task Neural Networks for Dealing with Missing Inputs , 2007, IWINAC.

[66]  Alfredo Petrosino,et al.  Asymmetric Kernel Scaling for Imbalanced Data Classification , 2011, WILF.

[67]  Yunqian Ma,et al.  Class Imbalance and Active Learning , 2013 .

[68]  Annarita D'Addabbo,et al.  Parallel selective sampling method for imbalanced and large data classification , 2015, Pattern Recognit. Lett..

[69]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[70]  Geoffrey J. McLachlan,et al.  Classification of Imbalanced Marketing Data with Balanced Random Sets , 2009, KDD Cup.

[71]  María José del Jesús,et al.  KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining , 2017, Int. J. Comput. Intell. Syst..

[72]  Sophie Midenet,et al.  Self-Organising Map for Data Imputation and Correction in Surveys , 2002, Neural Computing & Applications.

[73]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[74]  John K. Dixon,et al.  Pattern Recognition with Partly Missing Data , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[75]  Foster J. Provost,et al.  Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..

[76]  Lawrence Carin,et al.  On Classification with Incomplete Data , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[78]  Tshilidzi Marwala,et al.  Missing Data Estimation in High-Dimensional Datasets: A Swarm Intelligence-Deep Neural Network Approach , 2016, ICSI.

[79]  Thomas Hofmann,et al.  Kernel Methods for Missing Variables , 2005, AISTATS.

[80]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[81]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[82]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[83]  José Salvador Sánchez,et al.  Dissimilarity-Based Learning from Imbalanced Data with Small Disjuncts and Noise , 2015, IbPRIA.

[84]  Loris Nanni,et al.  A classifier ensemble approach for the missing feature problem , 2012, Artif. Intell. Medicine.

[85]  Joarder Kamruzzaman,et al.  z-SVM: An SVM for Improved Classification of Imbalanced Data , 2006, Australian Conference on Artificial Intelligence.

[86]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[87]  Luis E. Zárate,et al.  Techniques for Missing Value Recovering in Imbalanced Databases: Application in a Marketing Database with Massive Missing Data , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[88]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[89]  Yan Yan,et al.  Effective Facial Expression Recognition via the Boosted Convolutional Neural Network , 2015, CCCV.

[90]  Francisco Herrera,et al.  Weighted one-class classification for different types of minority class examples in imbalanced data , 2014, 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[91]  Malcolm I. Heywood,et al.  Properties of a GP active learning framework for streaming data with class imbalance , 2017, GECCO.

[92]  Volker Tresp,et al.  Training Neural Networks with Deficient Data , 1993, NIPS.

[93]  Swagatam Das,et al.  Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs , 2015, Neural Networks.

[94]  Rebekah Young,et al.  Handling Missing Values in Longitudinal Panel Data With Multiple Imputation. , 2015, Journal of marriage and the family.

[95]  Camelia Chira,et al.  A multi-objective evolutionary approach to imbalanced classification problems , 2015, 2015 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP).

[96]  Ming Ouyang,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[97]  Jiang Wang,et al.  Missing value imputation for microarray gene expression data using histone acetylation information , 2008, BMC Bioinformatics.

[98]  Nathalie Japkowicz,et al.  Supervised Learning with Unsupervised Output Separation , 2002 .

[99]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[100]  Wenjian Wang,et al.  An active learning-based SVM multi-class classification model , 2015, Pattern Recognit..

[101]  Francisco Herrera,et al.  Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics , 2012, Expert Syst. Appl..

[102]  Yisheng Lv,et al.  A deep learning based approach for traffic data imputation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[103]  Rahman Mm,et al.  Missing Value Imputation Using Stratified Supervised Learning for Cardiovascular Data , 2016 .

[104]  Sai-Ho Ling,et al.  An under-sampling method based on fuzzy logic for large imbalanced dataset , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[105]  Michael J. Pazzani,et al.  Reducing the small disjuncts problem by learning probabilistic concept descriptions , 1998 .

[106]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[107]  Tomasz Maciejewski,et al.  Local neighbourhood extension of SMOTE for mining imbalanced data , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[108]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[109]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[110]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[111]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[112]  Si Wu,et al.  Conformal Transformation of Kernel Functions: A Data-Dependent Way to Improve Support Vector Machine Classifiers , 2002, Neural Processing Letters.

[113]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[114]  Gary M. Weiss Learning with Rare Cases and Small Disjuncts , 1995, ICML.

[115]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[116]  Longbing Cao,et al.  Training deep neural networks on imbalanced data sets , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[117]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[118]  Tanasanee Phienthrakul,et al.  Combining Over-Sampling and Under-Sampling Techniques for Imbalance Dataset , 2017, ICMLC.

[119]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[120]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[121]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[122]  Tianbao Yang,et al.  Online Asymmetric Active Learning with Imbalanced Data , 2016, KDD.

[123]  Shigeru Katagiri,et al.  Confusion-Matrix-Based Kernel Logistic Regression for Imbalanced Data Classification , 2017, IEEE Transactions on Knowledge and Data Engineering.

[124]  Gerhard Tutz,et al.  Improved methods for the imputation of missing data by nearest neighbor methods , 2015, Comput. Stat. Data Anal..

[125]  Ingunn Myrtveit,et al.  Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods , 2001, IEEE Trans. Software Eng..

[126]  Fang Chen,et al.  Missing No More: Using the MCMC Procedure to Model Missing Data , 2013 .

[127]  Mark Johnston,et al.  Reusing Genetic Programming for Ensemble Selection in Classification of Unbalanced Data , 2014, IEEE Transactions on Evolutionary Computation.

[128]  Shuxiao Li,et al.  Cost-Effective Class-Imbalance Aware CNN for Vehicle Localization and Categorization in High Resolution Aerial Images , 2017, Remote. Sens..

[129]  Zahir Tari,et al.  KRNN: k Rare-class Nearest Neighbour classification , 2017, Pattern Recognit..

[130]  Yuming Zhou,et al.  A novel ensemble method for classifying imbalanced data , 2015, Pattern Recognit..

[131]  Yue-Shi Lee,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009, Expert Syst. Appl..

[132]  Bartosz Krawczyk,et al.  Diversity measures for one-class classifier ensembles , 2014, Neurocomputing.

[133]  Peter Williams,et al.  Scaling the Kernel Function to Improve Performance of the Support Vector Machine , 2005, ISNN.

[134]  R.J. Marks,et al.  Set constraint discovery: missing sensor data restoration using autoassociative regression machines , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[135]  Adam Krzyzak,et al.  One-Class Classification Decomposition for Imbalanced Classification of Breast Cancer Malignancy Data , 2014, ICAISC.

[136]  Xindong Wu,et al.  Active Learning With Imbalanced Multiple Noisy Labeling , 2015, IEEE Transactions on Cybernetics.

[137]  Paolo Soda,et al.  A multi-objective optimisation approach for class imbalance learning , 2011, Pattern Recognit..

[138]  Robert B. Fisher,et al.  A Color and Texture Based Hierarchical K-NN Approach to the Classification of Non-melanoma Skin Lesions , 2013 .

[139]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[140]  David J. Spiegelhalter,et al.  Introducing Markov chain Monte Carlo , 1995 .

[141]  Alfredo Petrosino,et al.  Adjusted F-measure and kernel scaling for imbalanced data learning , 2014, Inf. Sci..

[142]  Jacek Tabor,et al.  Incomplete data representation for SVM classification , 2016, ArXiv.

[143]  Peter A. Flach,et al.  Cost-sensitive boosting algorithms: Do we really need them? , 2016, Machine Learning.

[144]  John Nolan,et al.  Transferability of Knowledge Based Systems , 1991, MIE.

[145]  Q. Wang A Hybrid Sampling SVM Approach to Imbalanced Data Classification , 2014 .

[146]  Rosa Maria Valdovinos,et al.  New Applications of Ensembles of Classifiers , 2003, Pattern Analysis & Applications.

[147]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[148]  Jian Gao,et al.  A new sampling method for classifying imbalanced data based on support vector machine ensemble , 2016, Neurocomputing.

[149]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[150]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[151]  Nicholas J. Horton,et al.  Multiple Imputation in Practice , 2001 .

[152]  Hong Yan,et al.  A Bicluster-Based Bayesian Principal Component Analysis Method for Microarray Missing Value Estimation , 2014, IEEE Journal of Biomedical and Health Informatics.

[153]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[154]  Kien A. Hua,et al.  Field Effect Deep Networks for Image Recognition with Incomplete Data , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[155]  Vikram Pudi,et al.  Class Based Weighted K-Nearest Neighbor over Imbalance Dataset , 2013, PAKDD.

[156]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[157]  Chumphol Bunkhumpornpat,et al.  DBMUTE: density-based majority under-sampling technique , 2017, Knowledge and Information Systems.

[158]  Edward Y. Chang,et al.  Aligning boundary in kernel space for learning imbalanced dataset , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[159]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[160]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[161]  Francisco Herrera,et al.  SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory , 2012, Knowledge and Information Systems.

[162]  Min Chen,et al.  Deep Learning for Imbalanced Multimedia Data Classification , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[163]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[164]  Lei Wang,et al.  AdaBoost with SVM-based component classifiers , 2008, Eng. Appl. Artif. Intell..

[165]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[166]  Francisco Herrera,et al.  An insight into imbalanced Big Data classification: outcomes and challenges , 2017 .

[167]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[168]  Pieter Abbeel,et al.  Max-margin Classification of Data with Absent Features , 2008, J. Mach. Learn. Res..

[169]  D. Heitjan,et al.  Distinguishing “Missing at Random” and “Missing Completely at Random” , 1996 .

[170]  Bhavani M. Thuraisingham,et al.  An Effective Evidence Theory Based K-Nearest Neighbor (KNN) Classification , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[171]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[172]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[173]  Peter K. Sharpe,et al.  Dealing with missing values in neural network-based diagnostic systems , 1995, Neural Computing & Applications.

[174]  Swagatam Das,et al.  A feature weighted penalty based dissimilarity measure for k-nearest neighbor classification with missing features , 2016, Pattern Recognit. Lett..

[175]  Iqbal Gondal,et al.  Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data , 2005, Bioinform..

[176]  Daniel S. Yeung,et al.  Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems , 2015, IEEE Transactions on Cybernetics.

[177]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[178]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[179]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[180]  Inés María Galván,et al.  Using Evolutionary Multiobjective Techniques for Imbalanced Classification Data , 2010, ICANN.

[181]  Alen D. Shapiro,et al.  Structured induction in expert systems , 1987 .

[182]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[183]  MengChu Zhou,et al.  A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification , 2017, IEEE Transactions on Cybernetics.

[184]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[185]  Rayid Ghani,et al.  Online Active Learning with Imbalanced Classes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[186]  Chumphol Bunkhumpornpat,et al.  DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique , 2011, Applied Intelligence.

[187]  Chumphol Bunkhumpornpat,et al.  Parameter-Free Imputation for Imbalance Datasets , 2014, ICADL.

[188]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[189]  Lovedeep Gondara,et al.  Multiple Imputation Using Deep Denoising Autoencoders , 2017, ArXiv.

[190]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[191]  Hsuan-Tien Lin,et al.  Cost-Aware Pre-Training for Multiclass Cost-Sensitive Deep Learning , 2015, IJCAI.

[192]  Weidong Yang,et al.  Class-specific cost regulation extreme learning machine for imbalanced classification , 2017, Neurocomputing.

[193]  Hervé Déjean,et al.  Learning Rules and Their Exceptions , 2002, J. Mach. Learn. Res..

[194]  Yoshua Bengio,et al.  Recurrent Neural Networks for Missing or Asynchronous Data , 1995, NIPS.

[195]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Several Approaches to Missing Attribute Values in Data Mining , 2000, Rough Sets and Current Trends in Computing.

[196]  Haibo He,et al.  RAMOBoost: Ranked Minority Oversampling in Boosting , 2010, IEEE Transactions on Neural Networks.

[197]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[198]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[199]  Xinge You,et al.  Diverse Expected Gradient Active Learning for Relative Attributes , 2014, IEEE Transactions on Image Processing.

[200]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[201]  Chung-Ho Hsieh,et al.  Applying Under-Sampling Techniques and Cost-Sensitive Learning Methods on Risk Assessment of Breast Cancer , 2015, Journal of Medical Systems.

[202]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[203]  J. Ross Quinlan Improved Estimates for the Accuracy of Small Disjuncts , 2005, Machine Learning.

[204]  J. N. K. Rao,et al.  Empirical likelihood-based inference under imputation for missing response data , 2002 .

[205]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[206]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[207]  Chidchanok Lursinsap,et al.  Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques , 2013, Pattern Recognit. Lett..

[208]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[209]  Bonnie Kirkpatrick,et al.  Perfect Phylogeny Problems with Missing Values , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[210]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[211]  David J. Kriegman,et al.  Automated annotation of coral reef survey images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.