Deficient data classification with fuzzy learning

This thesis first proposes a novel algorithm for handling both missing values and imbalanced data classification problems. Then, algorithms for addressing the class imbalance problem in Twitter spam detection (Network Security Problem) have been proposed. Finally, the security profile of SVM against deliberate attacks has been simulated and analysed.

[1]  Taghi M. Khoshgoftaar,et al.  Survey of review spam detection using machine learning techniques , 2015, Journal of Big Data.

[2]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[3]  Elpida T. Keravnou,et al.  Risk Assessment for Primary Coronary Heart Disease Event Using Dynamic Bayesian Networks , 2015, AIME.

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  Pedro Abreu,et al.  Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values , 2015, Comput. Biol. Medicine.

[6]  Vipin Kumar,et al.  Evaluating boosting algorithms to classify rare classes: comparison and improvements , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Ömer Faruk Arar,et al.  Software defect prediction using cost-sensitive neural network , 2015, Appl. Soft Comput..

[8]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[9]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[10]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[11]  Xindong Wu,et al.  Active Learning With Imbalanced Multiple Noisy Labeling , 2015, IEEE Transactions on Cybernetics.

[12]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[13]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[14]  Chongfu Huang,et al.  Principle of information diffusion , 1997, Fuzzy Sets Syst..

[15]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[16]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[17]  Yanqing Zhang,et al.  Granular SVM-RFE gene selection algorithm for reliable prostate cancer classification on microarray expression data , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[18]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[19]  Guofei Gu,et al.  Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[20]  Ling Huang,et al.  Adversarial Active Learning , 2014, AISec '14.

[21]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[22]  Mickael Guedj,et al.  A Comparison of Six Methods for Missing Data Imputation , 2015 .

[23]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[24]  Sheng-sheng Wang,et al.  An Improved Asymmetric Bagging Relevance Feedback Strategy for Medical Image Retrieval , 2016, ICYCSEE.

[25]  Xiaoli Li,et al.  A Synthetic Minority Oversampling Method Based on Local Densities in Low-Dimensional Space for Imbalanced Learning , 2015, DASFAA.

[26]  Zhou Zhihua,et al.  Bagging-Based Selective Clusterer Ensemble , 2005 .

[27]  C Y Wang,et al.  imDC: an ensemble learning method for imbalanced classification with miRNA data. , 2015, Genetics and molecular research : GMR.

[28]  Kim-Kwang Raymond Choo,et al.  Cloud Storage Forensics , 2013, Contemporary Digital Forensic Investigations of Cloud and Mobile Applications.

[29]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[30]  Ruqian Lu,et al.  Sampling Ensembles for Frequent Patterns , 2005, FSKD.

[31]  Ralf Stecking,et al.  Using Multiple SVM Models for Unbalanced Credit Scoring Data Sets , 2007, GfKl.

[32]  Taghi M. Khoshgoftaar,et al.  An Empirical Study of Learning from Imbalanced Data Using Random Forest , 2007 .

[33]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[34]  Bithin Datta,et al.  Artificial neural network modeling for identification of unknown pollution sources in groundwater with partially missing concentration observation data , 2007 .

[35]  Rosa Maria Valdovinos,et al.  The Imbalanced Training Sample Problem: Under or over Sampling? , 2004, SSPR/SPR.

[36]  Victor S. Sheng,et al.  Thresholding for Making Classifiers Cost-sensitive , 2006, AAAI.

[37]  Abraham Kandel,et al.  Complex fuzzy logic , 2003, IEEE Trans. Fuzzy Syst..

[38]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[39]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[40]  Ali Mirza Mahmood,et al.  Class Imbalance Learning in Data Mining – A Survey , 2015 .

[41]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[42]  Stan Matwin,et al.  Cost-Sensitive Boosting Algorithms for Imbalanced Multi-instance Datasets , 2013, Canadian Conference on AI.

[43]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[44]  Mateusz Lango,et al.  Diversity Analysis on Imbalanced Data Using Neighbourhood and Roughly Balanced Bagging Ensembles , 2016, ICAISC.

[45]  Malek Ben Salem,et al.  A Survey of Insider Attack Detection Research , 2008, Insider Attack and Cyber Security.

[46]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[47]  Wojtek Michalowski,et al.  Application of Preprocessing Methods to Imbalanced Clinical Data: An Experimental Study , 2016, ITIB.

[48]  Rong Jin,et al.  Semi-Supervised Boosting for Multi-Class Classification , 2008, ECML/PKDD.

[49]  Zhi-Hua Zhou,et al.  Ensembling MML Causal Discovery , 2004, PAKDD.

[50]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[51]  Blaine Nelson,et al.  Exploiting Machine Learning to Subvert Your Spam Filter , 2008, LEET.

[52]  Igor Kononenko,et al.  Cost-Sensitive Learning with Neural Networks , 1998, ECAI.

[53]  Jong Kim,et al.  WarningBird: A Near Real-Time Detection System for Suspicious URLs in Twitter Stream , 2013, IEEE Transactions on Dependable and Secure Computing.

[54]  Mark Johnston,et al.  Developing New Fitness Functions in Genetic Programming for Classification With Unbalanced Data , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[55]  Haibo He,et al.  A Ranked Subspace Learning Method for Gene Expression Data Classification , 2007, IC-AI.

[56]  Patrick P. K. Chan,et al.  Adversarial Feature Selection Against Evasion Attacks , 2016, IEEE Transactions on Cybernetics.

[57]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[58]  Stephen H Bryant,et al.  An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data. , 2014, Analytica chimica acta.

[59]  Muttukrishnan Rajarajan,et al.  A survey of intrusion detection techniques in Cloud , 2013, J. Netw. Comput. Appl..

[60]  Chuang Yu,et al.  Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data , 2013, TheScientificWorldJournal.

[61]  Francisco Herrera,et al.  Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics , 2012, Expert Syst. Appl..

[62]  Taghi M. Khoshgoftaar,et al.  Is Data Sampling Required When Using Random Forest for Classification on Imbalanced Bioinformatics Data? , 2016, Theoretical Information Reuse and Integration.

[63]  Tony R. Martinez,et al.  Distribution-balanced stratified cross-validation for accuracy estimation , 2000, J. Exp. Theor. Artif. Intell..

[64]  Luis E. Zárate,et al.  Techniques for Missing Value Recovering in Imbalanced Databases: Application in a Marketing Database with Massive Missing Data , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[65]  Sebastián Ventura,et al.  Weighted Data Gravitation Classification for Standard and Imbalanced Data , 2013, IEEE Transactions on Cybernetics.

[66]  Xiao Liu,et al.  Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data , 2016, Knowl. Based Syst..

[67]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[68]  Thanh-Nghi Do,et al.  A Comparison of Different Off-Centered Entropies to Deal with Class Imbalance for Decision Trees , 2008, PAKDD.

[69]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[70]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[71]  David Casasent,et al.  New training strategies for RBF neural networks for X-ray agricultural product inspection , 2003, Pattern Recognit..

[72]  Yang Song,et al.  IKNN: Informative K-Nearest Neighbor Pattern Classification , 2007, PKDD.

[73]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[74]  Yongdong Zhang,et al.  Boosted Near-miss Under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets , 2016, Neurocomputing.

[75]  Sotiris B. Kotsiantis,et al.  Bagging and boosting variants for handling classifications problems: a survey , 2013, The Knowledge Engineering Review.

[76]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[77]  Alfredo Petrosino,et al.  Adjusted F-measure and kernel scaling for imbalanced data learning , 2014, Inf. Sci..

[78]  Shouhuai Xu,et al.  An evasion and counter-evasion study in malicious websites detection , 2014, 2014 IEEE Conference on Communications and Network Security.

[79]  Xianchao Zhang,et al.  Detecting Spam and Promoting Campaigns in the Twitter Social Network , 2012, 2012 IEEE 12th International Conference on Data Mining.

[80]  Kristian Kersting,et al.  Learning from Imbalanced Data in Relational Domains: A Soft Margin Approach , 2014, 2014 IEEE International Conference on Data Mining.

[81]  Sheng Chen,et al.  A Kernel-Based Two-Class Classifier for Imbalanced Data Sets , 2007, IEEE Transactions on Neural Networks.

[82]  José Martínez Sotoca,et al.  Improving the Performance of the RBF Neural Networks Trained with Imbalanced Samples , 2007, IWANN.

[83]  Xiao Sun,et al.  Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature , 2008, Bioinform..

[84]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[85]  Jong Kim,et al.  Spam Filtering in Twitter Using Sender-Receiver Relationship , 2011, RAID.

[86]  Francisco Herrera,et al.  IFROWANN: Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor Classification , 2015, IEEE Transactions on Fuzzy Systems.

[87]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[88]  M. Mostafizur Rahman,et al.  Cluster Based Under-Sampling for Unbalanced Cardiovascular Data , 2013 .

[89]  Handayani Tjandrasa,et al.  Classification of non-proliferative diabetic retinopathy based on hard exudates using soft margin SVM , 2013, 2013 IEEE International Conference on Control System, Computing and Engineering.

[90]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[91]  Sheng Chen,et al.  PDFOS: PDF estimation based over-sampling for imbalanced two-class problems , 2014, Neurocomputing.

[92]  Jiahao Zhang,et al.  Sample cutting method for imbalanced text sentiment classification based on BRC , 2013, Knowl. Based Syst..

[93]  Xingquan Zhu,et al.  A lazy bagging approach to classification , 2008, Pattern Recognit..

[94]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[95]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[96]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[97]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[98]  Bo Zhang,et al.  Learning concepts from large scale imbalanced data sets using support cluster machines , 2006, MM '06.

[99]  Haibo He,et al.  RAMOBoost: Ranked Minority Oversampling in Boosting , 2010, IEEE Transactions on Neural Networks.

[100]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[101]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[102]  J. Doug Tygar,et al.  Evasion and Hardening of Tree Ensemble Classifiers , 2015, ICML.

[103]  S. Jayasree,et al.  Classification of Imbalance Problem by MWMOTE and SSO , 2015 .

[104]  Christopher Ke,et al.  AN IN-DEPTH ANALYSIS OF ABUSE ON TWITTER , 2014 .

[105]  Immani Deepak,et al.  SECURITY EVALUATION OF PATTERN CLASSIFIERS UNDER ATTACK , 2018 .

[106]  Gianluca Stringhini,et al.  COMPA: Detecting Compromised Accounts on Social Networks , 2013, NDSS.

[107]  Shuqiang Yang,et al.  Symbolic representation based on trend features for biomedical data classification. , 2015, Technology and health care : official journal of the European Society for Engineering and Medicine.

[108]  Guodong Zhou,et al.  Active Learning for Imbalanced Sentiment Classification , 2012, EMNLP.

[109]  Ajith Abraham,et al.  A Review of Class Imbalance Problem , 2014 .

[110]  Kim-Kwang Raymond Choo,et al.  The cyber threat landscape: Challenges and future research directions , 2011, Comput. Secur..

[111]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[112]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[113]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[114]  Jos Twisk,et al.  Multiple imputation of missing values was not necessary before performing a longitudinal mixed-model analysis. , 2013, Journal of clinical epidemiology.

[115]  Nathalie Japkowicz,et al.  Boosting Support Vector Machines for Imbalanced Data Sets , 2008, ISMIS.

[116]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[117]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007 .

[118]  Claudio Cobelli,et al.  A Bayesian Network for Probabilistic Reasoning and Imputation of Missing Risk Factors in Type 2 Diabetes , 2015, AIME.

[119]  Jakub M. Tomczak,et al.  Boosted SVM with active learning strategy for imbalanced data , 2015, Soft Comput..

[120]  Gang Li,et al.  Study of Ensemble Strategies in Discovering Linear Causal Models , 2005, FSKD.

[121]  Sadie Creese,et al.  Identifying attack patterns for insider threat detection , 2015 .

[122]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[123]  Jorma Laurikkala,et al.  Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.

[124]  Francesco Sergio Pisani,et al.  Combining Ensemble of Classifiers by Using Genetic Programming for Cyber Security Applications , 2015, EvoApplications.

[125]  Joseph K. Liu,et al.  Secret Picture: An Efficient Tool for Mitigating Deletion Delay on OSN , 2015, ICICS.

[126]  Jun Zhang,et al.  Internet Traffic Classification Using Constrained Clustering , 2014, IEEE Transactions on Parallel and Distributed Systems.

[127]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[128]  Tao Xiang,et al.  Finding Rare Classes: Active Learning with Generative and Discriminative Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[129]  M. Chuah,et al.  Spam Detection on Twitter Using Traditional Classifiers , 2011, ATC.

[130]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[131]  Ali Feizollah,et al.  Evaluation of machine learning classifiers for mobile malware detection , 2014, Soft Computing.

[132]  M. Hubert,et al.  An adjusted boxplot for skewed distributions , 2008, Comput. Stat. Data Anal..

[133]  M. Pal,et al.  Random forests for land cover classification , 2003, IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No.03CH37477).

[134]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[135]  Iman Nekooeimehr,et al.  Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets , 2016, Expert Syst. Appl..

[136]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[137]  Ke Lu,et al.  Missing data imputation by K nearest neighbours based on grey relational structure and mutual information , 2015, Applied Intelligence.

[138]  H. D. de Vet,et al.  Missing Data: A Systematic Review of How They Are Reported and Handled , 2012, Epidemiology.

[139]  Peter Kampstra,et al.  Beanplot: A Boxplot Alternative for Visual Comparison of Distributions , 2008 .

[140]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[141]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[142]  Luis O. Jimenez,et al.  Classification of hyperdimensional data based on feature and decision fusion approaches using projection pursuit, majority voting, and neural networks , 1999, IEEE Trans. Geosci. Remote. Sens..

[143]  Dawn Xiaodong Song,et al.  Limits of Learning-based Signature Generation with Adversaries , 2008, NDSS.

[144]  R. Kishore Kumar,et al.  Comparative Study on Email Spam Classifier using Data Mining Techniques , 2012 .

[145]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[146]  Michel Verleysen,et al.  K nearest neighbours with mutual information for simultaneous classification and missing data imputation , 2009, Neurocomputing.

[147]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[148]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[149]  Xue Wu,et al.  An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra , 2009, Clinical Proteomics.

[150]  Mikel Galar,et al.  Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches , 2013, Knowl. Based Syst..

[151]  Mikel Galar,et al.  Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy , 2016, Appl. Soft Comput..

[152]  Fabio Roli,et al.  Poisoning Complete-Linkage Hierarchical Clustering , 2014, S+SSPR.

[153]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[154]  Fillia Makedon,et al.  HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data , 2005, Bioinform..

[155]  Jun Ni,et al.  An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[156]  Andreas Christmann,et al.  On Robustness Properties of Convex Risk Minimization Methods for Pattern Recognition , 2004, J. Mach. Learn. Res..

[157]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[158]  Kyle M Lang,et al.  The supermatrix technique: A simple framework for hypothesis testing with missing data , 2014 .

[159]  Jun Zhang,et al.  Fuzzy-Based Feature and Instance Recovery , 2016, ACIIDS.

[160]  Jing Zhang,et al.  Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data , 2016, Pattern Recognit. Lett..

[161]  Edward Y. Chang,et al.  Aligning boundary in kernel space for learning imbalanced dataset , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[162]  J. Carpenter,et al.  Practice of Epidemiology Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study , 2014 .

[163]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[164]  Mehran Amiri,et al.  Missing data imputation using fuzzy-rough methods , 2016, Neurocomputing.

[165]  Ali Dehghantanha,et al.  Investigating Social Networking applications on smartphones detecting Facebook, Twitter, LinkedIn and Google+ artefacts on Android and iOS platforms , 2016 .

[166]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[167]  James Newsome,et al.  Paragraph: Thwarting Signature Learning by Training Maliciously , 2006, RAID.

[168]  Dazhe Zhao,et al.  Restricted Boltzmann machines based oversampling and semi-supervised learning for false positive reduction in breast CAD. , 2015, Bio-medical materials and engineering.

[169]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[170]  Chun-Xia Zhang,et al.  An Empirical Study on the Performance of Cost-Sensitive Boosting Algorithms with Different Levels of Class Imbalance , 2013 .

[171]  David Lo,et al.  ELBlocker: Predicting blocking bugs with ensemble imbalance learning , 2015, Inf. Softw. Technol..

[172]  Jing Zhao,et al.  ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data , 2013, Neurocomputing.

[173]  Jerzy Stefanowski,et al.  Extending Bagging for Imbalanced Data , 2013, CORES.

[174]  J. Zupan,et al.  Self-organizing maps for imputation of missing data in incomplete data matrices , 2015 .

[175]  H. Kashima,et al.  Roughly balanced bagging for imbalanced data , 2009 .

[176]  Andreas Stolcke,et al.  A study in machine learning from imbalanced data for sentence boundary detection in speech , 2006, Comput. Speech Lang..

[177]  Zhoujun Li,et al.  Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[178]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[179]  Xiaodong Wang,et al.  Regularized EM algorithm for sparse parameter estimation in nonlinear dynamic systems with application to gene regulatory network inference , 2014, EURASIP J. Bioinform. Syst. Biol..

[180]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[181]  Chris Cornelis,et al.  SMOTE-FRST: a new resampling method using fuzzy rough set theory , 2012 .

[182]  Xiao Chen,et al.  6 million spam tweets: A large ground truth for timely Twitter spam detection , 2015, 2015 IEEE International Conference on Communications (ICC).

[183]  Jennifer G. Dy,et al.  Securing virtual execution environments through machine learning-based intrusion detection , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[184]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[185]  D. Lalitha Bhaskari,et al.  Intrusion Detection Using Random Forests Classifier with SMOTE and Feature Reduction , 2013, 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies.

[186]  Foster J. Provost,et al.  Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..

[187]  Ping Zhong,et al.  Least Squares Fuzzy One-class Support Vector Machine for Imbalanced Data , 2015 .

[188]  A. Ferrer,et al.  PCA model building with missing data: New proposals and a comparative study , 2015 .

[189]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[190]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[191]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[192]  Pei-Chann Chang,et al.  A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification , 2011, Appl. Soft Comput..

[193]  Blaine Nelson,et al.  Adversarial machine learning , 2019, AISec '11.

[194]  Djamel A. Zighed,et al.  Detection of breast cancer using an asymmetric entropy measure , 2006 .

[195]  Francisco Herrera,et al.  A Compact Evolutionary Interval-Valued Fuzzy Rule-Based Classification System for the Modeling and Prediction of Real-World Financial Applications With Imbalanced Data , 2015, IEEE Transactions on Fuzzy Systems.

[196]  Bartosz Krawczyk,et al.  Cost-Sensitive Neural Network with ROC-Based Moving Threshold for Imbalanced Classification , 2015, IDEAL.

[197]  Jie Wu,et al.  Robust Network Traffic Classification , 2015, IEEE/ACM Transactions on Networking.

[198]  Jun Zhang,et al.  A Performance Evaluation of Machine Learning-Based Streaming Spam Tweets Detection , 2015, IEEE Transactions on Computational Social Systems.

[199]  Claudia Eckert,et al.  Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[200]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[201]  Xiangji Huang,et al.  Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles , 2006, PAKDD.

[202]  Adriana Pérez,et al.  Use of the mean, hot deck and multiple imputation techniques to predict outcome in intensive care unit patients in Colombia , 2002, Statistics in medicine.

[203]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[204]  Marwan A. Al-Namari,et al.  Internet traffic classification using machine learning approach: Datasets validation issues , 2016, 2016 Conference of Basic Sciences and Engineering Studies (SGCAC).

[205]  Bo Tang,et al.  KernelADASYN: Kernel based adaptive synthetic data generation for imbalanced learning , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[206]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[207]  Aloysius K. Mok,et al.  Allergy Attack Against Automatic Signature Generation , 2006, RAID.

[208]  A. K. Qin,et al.  Kernel neural gas algorithms with application to cluster analysis , 2004, ICPR 2004.

[209]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2013, IEEE Trans. Pattern Anal. Mach. Intell..

[210]  C. Lee Giles,et al.  Active learning for class imbalance problem , 2007, SIGIR.

[211]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[212]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[213]  Jingrui He,et al.  Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data , 2016, Political Analysis.

[214]  Didier Dubois,et al.  Readings in Fuzzy Sets for Intelligent Systems , 1993 .

[215]  Longbing Cao,et al.  Effective detection of sophisticated online banking fraud on extremely imbalanced data , 2012, World Wide Web.

[216]  Zili Zhang,et al.  Sample Subset Optimization Techniques for Imbalanced and Ensemble Learning Problems in Bioinformatics Applications , 2014, IEEE Transactions on Cybernetics.

[217]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[218]  Ekrem Duman,et al.  A cost-sensitive decision tree approach for fraud detection , 2013, Expert Syst. Appl..

[219]  Wei You,et al.  Cracking Classifiers for Evasion: A Case Study on the Google's Phishing Pages Filter , 2016, WWW.

[220]  Jerzy Stefanowski,et al.  Neighbourhood sampling in bagging for imbalanced data , 2015, Neurocomputing.

[221]  Francisco Herrera,et al.  On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed , 2014, Inf. Sci..

[222]  Swagatam Das,et al.  Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs , 2015, Neural Networks.

[223]  Francisco Herrera,et al.  Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data , 2015, Fuzzy Sets Syst..

[224]  Paul M. Thompson,et al.  Analysis of sampling techniques for imbalanced data: An n=648 ADNI study , 2014, NeuroImage.

[225]  Xindong Wu,et al.  Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[226]  Marvin L. Brown,et al.  Effects of Data Imputation Methods on Data Missingness in Data Mining , 2011 .

[227]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[228]  Jun Zhang,et al.  Statistical Detection of Online Drifting Twitter Spam: Invited Paper , 2016, AsiaCCS.

[229]  Jun Zhang,et al.  Asymmetric self-learning for tackling Twitter Spam Drift , 2015, 2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[230]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[231]  Gerald Schaefer,et al.  Cost-sensitive decision tree ensembles for effective imbalanced classification , 2014, Appl. Soft Comput..

[232]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[233]  Loris Nanni,et al.  Coupling different methods for overcoming the class imbalance problem , 2015, Neurocomputing.

[234]  Mohamed Bekkar,et al.  Imbalanced Data Learning Approaches Review , 2013 .

[235]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[236]  Jia Wu,et al.  CogBoost: Boosting for Fast Cost-Sensitive Graph Classification , 2015, IEEE Transactions on Knowledge and Data Engineering.

[237]  Ling Zhuang,et al.  Parameter Optimization of Kernel-based One-class Classifier on Imbalance Learning , 2006, J. Comput..

[238]  Taghi M. Khoshgoftaar,et al.  Improving Software-Quality Predictions With Data Sampling and Boosting , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[239]  Juan José Rodríguez Diez,et al.  Diversity techniques improve the performance of the best imbalance learning ensembles , 2015, Inf. Sci..

[240]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[241]  Esther-Lydia Silva-Ramírez,et al.  Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns , 2015, Appl. Soft Comput..

[242]  Yuxin Peng,et al.  Adaptive Sampling with Optimal Cost for Class-Imbalance Learning , 2015, AAAI.

[243]  David W. Opitz,et al.  Actively Searching for an E(cid:11)ective Neural-Network Ensemble , 1996 .

[244]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[245]  Mohammad Khalilia,et al.  Predicting disease risks from highly imbalanced data using random forest , 2011, BMC Medical Informatics Decis. Mak..

[246]  Shaik. AshaBee,et al.  Towards Online Spam Filtering In Social Networks , 2017 .

[247]  Ping Xu,et al.  Modified linear discriminant analysis approaches for classification of high-dimensional microarray data , 2009, Comput. Stat. Data Anal..

[248]  Gian Luca Marcialis,et al.  Statistical Meta-Analysis of Presentation Attacks for Secure Multibiometric Systems , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[249]  Yanqing Zhang,et al.  Granular SVM with Repetitive Undersampling for Highly Imbalanced Protein Homology Prediction , 2006, 2006 IEEE International Conference on Granular Computing.

[250]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .

[251]  Sebastien Haneuse,et al.  Learning About Missing Data Mechanisms in Electronic Health Records-based Research: A Survey-based Approach , 2016, Epidemiology.