Data preprocessing in predictive data mining
暂无分享,去创建一个
Sotiris B. Kotsiantis | Michael N. Vrahatis | Stamatios-Aggelos N. Alexandropoulos | M. N. Vrahatis | S. Kotsiantis | M. Vrahatis
[1] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..
[2] Srinivasan Parthasarathy,et al. Fast mining of distance-based outliers in high-dimensional datasets , 2008, Data Mining and Knowledge Discovery.
[3] Tzu-Tsung Wong,et al. A hybrid discretization method for naïve Bayesian classifiers , 2012, Pattern Recognit..
[4] Heiko Hoffmann,et al. Kernel PCA for novelty detection , 2007, Pattern Recognit..
[5] José Cristóbal Riquelme Santos,et al. On the evolutionary optimization of k-NN by label-dependent feature weighting , 2012, Pattern Recognit. Lett..
[6] Tony R. Martinez,et al. Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.
[7] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..
[8] Peter Filzmoser,et al. Outlier identification in high dimensions , 2008, Comput. Stat. Data Anal..
[9] Chengqi Zhang,et al. POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases , 2009, Expert Syst. Appl..
[10] Min Li,et al. An effective discretization based on Class-Attribute Coherence Maximization , 2011, Pattern Recognit. Lett..
[11] Qinghua Hu,et al. Feature evaluation and selection based on neighborhood soft margin , 2010, Neurocomputing.
[12] Kun Chang Lee,et al. Adaptive pairing of classifier and imputation methods based on the characteristics of missing values in data sets , 2016, Expert Syst. Appl..
[13] Fernando Nogueira,et al. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..
[14] Ireneusz Czarnowski,et al. Prototype selection algorithms for distributed learning , 2010, Pattern Recognit..
[15] Fei Tony Liu,et al. Isolation-Based Anomaly Detection , 2012, TKDD.
[16] Claudomiro Sales,et al. Multi-objective genetic algorithm for missing data imputation , 2015, Pattern Recognit. Lett..
[17] João Miguel da Costa Sousa,et al. Missing data in medical databases: Impute, delete or classify? , 2013, Artif. Intell. Medicine.
[18] José Salvador Sánchez,et al. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance , 2012, Knowl. Based Syst..
[19] Richard Weber,et al. A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..
[20] Charu C. Aggarwal,et al. An Introduction to Outlier Analysis , 2013 .
[21] K. Mehrotra,et al. A clustering-based discretization for supervised learning , 2010 .
[22] Hiroshi Motoda,et al. Computational Methods of Feature Selection , 2022 .
[23] Peter A. N. Bosman,et al. Symbolic regression and feature construction with GP-GOMEA applied to radiotherapy dose reconstruction of childhood cancer survivors , 2018, GECCO.
[24] Taghi M. Khoshgoftaar,et al. Class noise detection using frequent itemsets , 2006, Intell. Data Anal..
[25] Ireneusz Czarnowski. Cluster-based instance selection for machine classification , 2010, Knowledge and Information Systems.
[26] Selwyn Piramuthu,et al. Artificial Intelligence and Information Technology Evaluating feature selection methods for learning in data mining applications , 2004 .
[27] Nicolás García-Pedrajas,et al. A cooperative coevolutionary algorithm for instance selection for instance-based learning , 2010, Machine Learning.
[28] Loris Nanni,et al. Prototype reduction techniques: A comparison among different approaches , 2011, Expert Syst. Appl..
[29] Esther-Lydia Silva-Ramírez,et al. Missing value imputation on missing completely at random data using multilayer perceptrons , 2011, Neural Networks.
[30] Francisco Herrera,et al. A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[31] Francisco Herrera,et al. Subgroup discover in large size data sets preprocessed using stratified instance selection for increasing the presence of minority classes , 2008, Pattern Recognit. Lett..
[32] Neil D. Lawrence,et al. Dataset Shift in Machine Learning , 2009 .
[33] Lukasz A. Kurgan,et al. Discretization as the enabling technique for the Naïve Bayes and semi-Naïve Bayes-based classification , 2010, Knowl. Eng. Rev..
[34] Francisco Herrera,et al. A Survey on Evolutionary Instance Selection and Generation , 2010, Int. J. Appl. Metaheuristic Comput..
[35] T. Kathirvalavakumar,et al. A new discretization algorithm based on range coefficient of dispersion and skewness for neural networks classifier , 2012, Appl. Soft Comput..
[36] Xindong Wu,et al. Mining With Noise Knowledge: Error-Aware Data Mining , 2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.
[37] Carlos Soares,et al. Entropy-based discretization methods for ranking data , 2016, Inf. Sci..
[38] B. John Oommen,et al. A brief taxonomy and ranking of creative prototype reduction schemes , 2003, Pattern Analysis & Applications.
[39] Ronald K. Pearson,et al. Mining imperfect data - dealing with contamination and incomplete records , 2005 .
[40] Xingquan Zhu,et al. Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.
[41] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[42] José Francisco Martínez Trinidad,et al. A review of instance selection methods , 2010, Artificial Intelligence Review.
[43] Bingru Yang,et al. A SVM Regression Based Approach to Filling in Missing Values , 2005, KES.
[44] José Ramón Cano,et al. Strategies for Scaling Up Evolutionary Instance Reduction Algorithms for Data Mining , 2005 .
[45] Francisco Herrera,et al. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.
[46] Krzysztof J. Cios,et al. ur-CAIM: improved CAIM discretization for unbalanced and balanced data , 2016, Soft Comput..
[47] Lawrence O. Hall,et al. Active cleaning of label noise , 2016, Pattern Recognit..
[48] K.Z. Mao,et al. Orthogonal forward selection and backward elimination algorithms for feature subset selection , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[49] Davy Janssens,et al. Evaluating the performance of cost-based discretization versus entropy- and error-based discretization , 2006, Comput. Oper. Res..
[50] Lukasz A. Kurgan,et al. Impact of imputation of missing values on classification error for discrete data , 2008, Pattern Recognit..
[51] Wei Wang,et al. A comparison of outlier detection algorithms for ITS data , 2010, Expert Syst. Appl..
[52] José Francisco Martínez Trinidad,et al. InstanceRank based on borders for instance selection , 2013, Pattern Recognit..
[53] HerreraFrancisco,et al. A survey on data preprocessing for data stream mining , 2017 .
[54] Nicolás García-Pedrajas,et al. A divide-and-conquer recursive approach for scaling up instance selection algorithms , 2009, Data Mining and Knowledge Discovery.
[55] David B. Skillicorn,et al. Distributed prediction from vertically partitioned data , 2008, J. Parallel Distributed Comput..
[56] Flavio Manenti,et al. Outlier detection in large data sets , 2011, Comput. Chem. Eng..
[57] Donghai Guan,et al. Identifying mislabeled training data with the aid of unlabeled data , 2011, Applied Intelligence.
[58] Taeho Jo,et al. A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..
[59] Tapio Elomaa,et al. Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates , 2004, Data Mining and Knowledge Discovery.
[60] Francisco Herrera,et al. A survey on data preprocessing for data stream mining: Current status and future directions , 2017, Neurocomputing.
[61] Wei-Pang Yang,et al. A discretization algorithm based on Class-Attribute Contingency Coefficient , 2008, Inf. Sci..
[62] Derek Greene,et al. Missing value imputation for epistatic MAPs , 2010, BMC Bioinformatics.
[63] James C. Bezdek,et al. Nearest prototype classifier designs: An experimental study , 2001, Int. J. Intell. Syst..
[64] Pavel Pudil,et al. Feature selection toolbox , 2002, Pattern Recognit..
[65] Stan Matwin,et al. Ensembles of label noise filters: a ranking approach , 2016, Data Mining and Knowledge Discovery.
[66] Thomas Reinartz,et al. A Unifying View on Instance Selection , 2002, Data Mining and Knowledge Discovery.
[67] Sotiris B. Kotsiantis,et al. Hybrid local boosting utilizing unlabeled data in classification tasks , 2017, Evolving Systems.
[68] Francisco Herrera,et al. INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control , 2016, Inf. Fusion.
[69] Brian Mac Namee,et al. Profiling instances in noise reduction , 2012, Knowl. Based Syst..
[70] Robert P. W. Duin,et al. Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..
[71] Huaiqing Wang,et al. A discretization algorithm based on a heterogeneity criterion , 2005, IEEE Transactions on Knowledge and Data Engineering.
[72] Huan Liu,et al. Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.
[73] Taghi M. Khoshgoftaar,et al. The pairwise attribute noise detection algorithm , 2007, Knowledge and Information Systems.
[74] Larry Bull,et al. Genetic Programming with a Genetic Algorithm for Feature Construction and Selection , 2005, Genetic Programming and Evolvable Machines.
[75] Francisco Herrera,et al. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics , 2012, Expert Syst. Appl..
[76] André Carlos Ponce de Leon Ferreira de Carvalho,et al. Noise detection in the meta-learning level , 2016, Neurocomputing.
[77] M. A. H. Farquad,et al. Preprocessing unbalanced data using support vector machine , 2012, Decis. Support Syst..
[78] Wenyong Wang,et al. An efficient instance selection algorithm to reconstruct training set for support vector machine , 2017, Knowl. Based Syst..
[79] Verónica Bolón-Canedo,et al. Data discretization: taxonomy and big data challenge , 2016, WIREs Data Mining Knowl. Discov..
[80] Sven F. Crone,et al. The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing , 2006, Eur. J. Oper. Res..
[81] Hugo Jair Escalante,et al. A Comparison of Outlier Detection Algorithms for Machine Learning , 2005 .
[82] Li Feng,et al. Supervised and Adaptive Feature Weighting for Object-Based Classification on Satellite Images , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
[83] Katsumi Inoue,et al. Relational Reinforcement Learning for Planning with Exogenous Effects , 2017 .
[84] Javier Pérez-Rodríguez,et al. Multi-selection of instances: A straightforward way to improve evolutionary instance selection , 2012, Appl. Soft Comput..
[85] Taghi M. Khoshgoftaar,et al. Knowledge discovery from imbalanced and noisy data , 2009, Data Knowl. Eng..
[86] Enrico Blanzieri,et al. Noise reduction for instance-based learning with a local maximal margin approach , 2010, Journal of Intelligent Information Systems.
[87] Jose Miguel Puerta,et al. Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter? , 2011, Applied Intelligence.
[88] Ratna Babu Chinnam,et al. mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification , 2011, Inf. Sci..
[89] Juan José Rodríguez Diez,et al. Instance selection of linear complexity for big data , 2016, Knowl. Based Syst..
[90] Chih-Fong Tsai,et al. Combining instance selection for better missing value imputation , 2016, J. Syst. Softw..
[91] Xiaofeng Zhu,et al. Missing data imputation by utilizing information within incomplete instances , 2011, J. Syst. Softw..
[92] Xindong Wu,et al. Discretization Methods , 2010, Data Mining and Knowledge Discovery Handbook.
[93] Gustavo E. A. P. A. Batista,et al. A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.
[94] Tommy W. S. Chow,et al. Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.
[95] Leonardo Franco,et al. Missing data imputation using statistical and machine learning methods in a real breast cancer problem , 2010, Artif. Intell. Medicine.
[96] Andrew K. C. Wong,et al. Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..
[97] Quan Pan,et al. Adaptive imputation of missing values for incomplete pattern classification , 2016, Pattern Recognit..
[98] Fabrizio Angiulli,et al. Exploiting domain knowledge to detect outliers , 2013, Data Mining and Knowledge Discovery.
[99] Dorian Pyle,et al. Data Preparation for Data Mining , 1999 .
[100] อนิรุธ สืบสิงห์,et al. Data Mining Practical Machine Learning Tools and Techniques , 2014 .
[101] Yumin Chen,et al. Neighborhood outlier detection , 2010, Expert Syst. Appl..
[102] Francisco Herrera,et al. An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..
[103] Francisco Herrera,et al. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[104] Francisco Herrera,et al. Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification , 2013, Pattern Recognit..
[105] Francisco Herrera,et al. A unifying view on dataset shift in classification , 2012, Pattern Recognit..
[106] Zixiang Xiong,et al. Optimal number of features as a function of sample size for various classification rules , 2005, Bioinform..
[107] William Eberle,et al. Learning to detect representative data for large scale instance selection , 2015, J. Syst. Softw..
[108] Hossein Nezamabadi-pour,et al. Using fuzzy-rough set feature selection for feature construction based on genetic programming , 2018, 2018 3rd Conference on Swarm Intelligence and Evolutionary Computation (CSIEC).
[109] Michael N. Vrahatis,et al. Particle Swarm Optimization and Intelligence: Advances and Applications , 2010 .
[110] Shichao Zhang,et al. Shell-neighbor method and its application in missing data imputation , 2011, Applied Intelligence.
[111] Haibo He,et al. Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.
[112] Renato Cordeiro de Amorim,et al. Feature weighting as a tool for unsupervised feature selection , 2018, Inf. Process. Lett..
[113] María José del Jesús,et al. A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets , 2017, Int. J. Neural Syst..
[114] Ron Kohavi,et al. Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.
[115] Sotiris B. Kotsiantis,et al. Combining Prototype Selection with Local Boosting , 2016, AIAI.
[116] Clara Pizzuti,et al. Outlier mining in large high-dimensional data sets , 2005, IEEE Transactions on Knowledge and Data Engineering.
[117] Francisco Herrera,et al. On the choice of the best imputation methods for missing values considering three groups of classification methods , 2012, Knowledge and Information Systems.
[118] Marc Boullé,et al. Khiops: A Statistical Discretization Method of Continuous Attributes , 2004, Machine Learning.
[119] Elena Marchiori,et al. Hit Miss Networks with Applications to Instance Selection , 2008, J. Mach. Learn. Res..
[120] Riyaz Sikora,et al. Iterative feature construction for improving inductive learning algorithms , 2009, Expert Syst. Appl..
[121] Ruoming Jin,et al. Data discretization unification , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).
[122] Antonio González Muñoz,et al. Combining instance selection methods based on data characterization: An approach to increase their effectiveness , 2011, Inf. Sci..
[123] Dong-Chul Park. Centroid Neural Network with Weighted Features , 2009, J. Circuits Syst. Comput..
[124] Lukasz A. Kurgan,et al. CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.
[125] Chengqi Zhang,et al. Missing Value Imputation Based on Data Clustering , 2008, Trans. Comput. Sci..
[126] Fan Ming-hui. Review of Outlier Detection , 2006 .
[127] Heiko Wersing,et al. Incremental on-line learning: A review and comparison of state of the art algorithms , 2018, Neurocomputing.
[128] David Zhang,et al. Hand-Geometry Recognition Using Entropy-Based Discretization , 2007, IEEE Transactions on Information Forensics and Security.
[129] Ralf Klinkenberg,et al. Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..
[130] Huan Liu,et al. On Issues of Instance Selection , 2002, Data Mining and Knowledge Discovery.
[131] Hong Shen,et al. Multi-criteria feature selection on cost-sensitive data with missing values , 2016, Pattern Recognit..
[132] Luis González Abril,et al. Ameva: An autonomous discretization algorithm , 2009, Expert Syst. Appl..
[133] Chuanhe Shen,et al. Feature Weighting of Support Vector Machines Based on Derivative Saliency Analysis and Its Application to Financial Data Mining , 2012 .
[134] James Kennedy,et al. Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.
[135] Francisco Herrera,et al. A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..
[136] Ming-jian Zhou,et al. An Outlier Mining Algorithm Based on Dissimilarity , 2012 .
[137] Francisco Herrera,et al. IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule , 2010, Pattern Recognit..
[138] Rich Caruana,et al. Benefitting from the Variables that Variable Selection Discards , 2003, J. Mach. Learn. Res..
[139] Bokyoung Kang,et al. Fast outlier detection for very large log data , 2011, Expert Syst. Appl..
[140] Francisco Herrera,et al. Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.
[141] Witold Pedrycz,et al. A Novel Framework for Imputation of Missing Values in Databases , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.
[142] Francisco Herrera,et al. Integrating a differential evolution feature weighting scheme into prototype generation , 2012, Neurocomputing.
[143] Edward R. Dougherty,et al. Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..
[144] Q. Henry Wu,et al. A class boundary preserving algorithm for data condensation , 2011, Pattern Recognit..
[145] David W. Aha,et al. A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.