论文信息 - Big Data Preprocessing: Enabling Smart Data

Big Data Preprocessing: Enabling Smart Data

[1] Francisco Herrera,et al. Big Data Preprocessing as the Bridge between Big Data and Smart Data: BigDaPSpark and BigDaPFlink Libraries , 2019, IoTBDS.

[2] Naixue Xiong,et al. A novel code data dissemination scheme for Internet of Things through mobile vehicle of smart cities , 2019, Future Gener. Comput. Syst..

[3] Francisco Herrera,et al. Brightness guided preprocessing for automatic cold steel weapon detection in surveillance videos with deep learning , 2019, Neurocomputing.

[4] Francisco Herrera,et al. SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data , 2018, J. Comput. Sci. Technol..

[5] Francisco Herrera,et al. Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data , 2018, WIREs Data Mining Knowl. Discov..

[6] Mario Piattini,et al. From big data to smart data: a data quality perspective , 2018, EnSEmble@ESEC/SIGSOFT FSE.

[7] Francisco Herrera,et al. DPASF: a flink library for streaming data preprocessing , 2018, Big Data Analytics.

[8] S. García,et al. Online entropy-based discretization for data streaming classification , 2018, Future generations computer systems.

[9] Verónica Bolón-Canedo,et al. An Information Theory-Based Feature Selection Framework for Big Data Under Apache Spark , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10] Francisco Herrera,et al. Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce , 2018, Inf. Fusion.

[11] Francisco Herrera,et al. Principal Components Analysis Random Discretization Ensemble for Big Data , 2018, Knowl. Based Syst..

[12] Francisco Herrera,et al. A distributed evolutionary multivariate discretizer for Big Data processing on Apache Spark , 2018, Swarm Evol. Comput..

[13] Luis de Marcos,et al. Distributed ReliefF-based feature selection in Spark , 2018, Knowledge and Information Systems.

[14] Nitin Narang,et al. Imbalanced big data classification: a distributed implementation of SMOTE , 2018, ICDCN Workshops.

[15] Luis Perez,et al. The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[16] María José del Jesús,et al. KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining , 2017, Int. J. Comput. Intell. Syst..

[17] Sergio Ramírez-Gallego,et al. Nearest Neighbor Classification for High-Speed Big Data Streams Using Spark , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[18] Md. Zakirul Alam Bhuiyan,et al. A Survey on Deep Learning in Big Data , 2017, 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC).

[19] Robert Ivor John,et al. An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.

[20] Jun-Hai Zhai,et al. The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers , 2015, International Journal of Machine Learning and Cybernetics.

[21] Francisco Herrera,et al. SMOTE-GPU: Big Data preprocessing on commodity hardware for imbalanced classification , 2017, Progress in Artificial Intelligence.

[22] Francisco Herrera,et al. Enabling Smart Data: Noise filtering in Big Data classification , 2017, Inf. Sci..

[23] Francisco Herrera,et al. An insight into imbalanced Big Data classification: outcomes and challenges , 2017 .

[24] Álvar Arnaiz-González,et al. MR-DIS: democratic instance selection for big data by MapReduce , 2017, Progress in Artificial Intelligence.

[25] Verónica Bolón-Canedo,et al. Fast‐mRMR: Fast Minimum Redundancy Maximum Relevance Algorithm for High‐Dimensional Big Data , 2017, Int. J. Intell. Syst..

[26] Francisco Herrera,et al. kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data , 2017, Knowl. Based Syst..

[27] Francisco Herrera,et al. GPU-SME-kNN: Scalable and memory efficient kNN and lazy learning using GPUs , 2016, Inf. Sci..

[28] Huan Liu,et al. Challenges of Feature Selection for Big Data Analytics , 2016, IEEE Intelligent Systems.

[29] Francisco Herrera,et al. Big data preprocessing: methods and prospects , 2016 .

[30] Arun Sharma,et al. Scalable machine‐learning algorithms for big data analytics: a comprehensive review , 2016, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[31] Maoguo Gong,et al. RBoost: Label Noise-Robust Boosting Algorithm Based on a Nonconvex Loss Function and the Numerically Stable Base Learners , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[32] Vasyl Lytvyn,et al. Smart Data Integration by Goal Driven Ontology Learning , 2016, INNS Conference on Big Data.

[33] Mark D. McDonnell,et al. Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[34] Nadia Essoussi,et al. A Parallel Implementation of Relief Algorithm Using Mapreduce Paradigm , 2016, ICCCI.

[35] Juan José Rodríguez Diez,et al. Instance selection of linear complexity for big data , 2016, Knowl. Based Syst..

[36] Xin Yao,et al. A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[37] Francisco Herrera,et al. Evolutionary undersampling for extremely imbalanced big data classification under apache spark , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[38] Nilanjan Dey,et al. A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset , 2016, Comput. Methods Programs Biomed..

[39] Weisong Shi,et al. Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[40] Bartosz Krawczyk,et al. GPU-Accelerated Extreme Learning Machines for Imbalanced Data Streams with Concept Drift , 2016, ICCS.

[41] Jim Austin,et al. Hadoop neural network for parallel and distributed feature selection , 2016, Neural Networks.

[42] Francisco Herrera,et al. Tutorial on practical tips of the most influential data preprocessing algorithms in data mining , 2016, Knowl. Based Syst..

[43] Francesco Marcelloni,et al. A MapReduce solution for associative classification of big data , 2016, Inf. Sci..

[44] Francisco Herrera,et al. Multivariate Discretization Based on Evolutionary Cut Points Selection for Classification , 2016, IEEE Transactions on Cybernetics.

[45] C. Giraud-Carrier,et al. Efficient mining of high-speed uncertain data streams , 2015, Applied Intelligence.

[46] Santanu Kumar Rath,et al. Classification of microarray using MapReduce based proximal support vector machine classifier , 2015, Knowl. Based Syst..

[47] Stefan Jähnichen,et al. Towards a taxonomy of standards in smart data , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[48] Sergio Ramírez-Gallego,et al. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach , 2015 .

[49] Francisco Herrera,et al. ROSEFW-RF: The winner algorithm for the ECBDL'14 big data competition: An extremely imbalanced big data bioinformatics problem , 2015, Knowl. Based Syst..

[50] Alberto Mozo,et al. Massively Parallel Unsupervised Feature Selection on Spark , 2015, ADBIS.

[51] Verónica Bolón-Canedo,et al. Recent advances and emerging challenges of feature selection in the context of big data , 2015, Knowl. Based Syst..

[52] Jason J. Jung,et al. Social big data: Recent achievements and new challenges , 2015, Information Fusion.

[53] Francisco Herrera,et al. Analysis of Data Preprocessing Increasing the Oversampling Ratio for Extremely Imbalanced Big Data Classification , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[54] Sonja Filiposka,et al. Feature Ranking Based on Information Gain for Large Classification Problems with MapReduce , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[55] André Carlos Ponce de Leon Ferreira de Carvalho,et al. Effect of label noise in the complexity of classification problems , 2015, Neurocomputing.

[56] Mohsen Guizani,et al. Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications , 2015, IEEE Communications Surveys & Tutorials.

[57] Sachin S. Patil,et al. Enhanced SMOTE algorithm for classification of imbalanced big-data using Random Forest , 2015, 2015 IEEE International Advance Computing Conference (IACC).

[58] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[59] Joseph K. Bradley,et al. Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[60] Ameet Talwalkar,et al. MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[61] Francisco Herrera,et al. Evolutionary undersampling for imbalanced big data classification , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[62] Charu C. Aggarwal,et al. Data Mining: The Textbook , 2015 .

[63] Murtaza Haider,et al. Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[64] Lida Xu,et al. The internet of things: a survey , 2014, Information Systems Frontiers.

[65] Francisco Herrera,et al. MRPR: A MapReduce solution for prototype reduction in big data classification , 2015, Neurocomputing.

[66] Geoffrey I. Webb. Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data , 2014, 2014 IEEE International Conference on Data Mining.

[67] Fuzhen Zhuang,et al. Parallel feature selection using positive approximation based on MapReduce , 2014, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[68] Yong Zhang,et al. Parallel Implementation of Chi2 Algorithm in MapReduce Framework , 2014, HCC.

[69] Francisco Herrera,et al. On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[70] María José del Jesús,et al. Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks , 2014, WIREs Data Mining Knowl. Discov..

[71] Francisco Herrera,et al. Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[72] C. L. Philip Chen,et al. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[73] Ivor W. Tsang,et al. The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.

[74] Manesh Dalavi,et al. Hadoop MapReduce implementation of a novel scheme for term weighting in text categorization , 2014, 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT).

[75] Zhao Li,et al. Data intensive parallel feature selection method study , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[76] Sebastián Ventura,et al. Scalable CAIM discretization on multiple GPUs using concurrent kernels , 2014, The Journal of Supercomputing.

[77] Yonggang Wen,et al. Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[78] M. Verleysen,et al. Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[79] Alex Pentland,et al. Big Data and Management , 2014 .

[80] Rong Jin,et al. Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[81] A. Bifet,et al. A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[82] P. Baldi,et al. Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[83] Tiranee Achalakul,et al. Feature Reduction for Anomaly Detection in Manufacturing with MapReduce GA/kNN , 2013, 2013 International Conference on Parallel and Distributed Systems.

[84] Francisco Herrera,et al. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[85] Feng Hu,et al. A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRSBoundary-SMOTE , 2013 .

[86] Kai Chen,et al. Differentially private feature selection under MapReduce framework , 2013 .

[87] Gilles Louppe,et al. Independent consultant , 2013 .

[88] Han Liu,et al. Challenges of Big Data Analysis. , 2013, National science review.

[89] V. Marx. Biology: The big challenges of big data , 2013, Nature.

[90] Mikel Galar,et al. Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches , 2013, Knowl. Based Syst..

[91] Javier Pérez-Rodríguez,et al. A scalable approach to simultaneous evolutionary instance and feature selection , 2013, Inf. Sci..

[92] Francisco Herrera,et al. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[93] Daniel E. O'Leary,et al. Artificial Intelligence and Big Data , 2013, IEEE Intelligent Systems.

[94] Veda C. Storey,et al. Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[95] Francisco Herrera,et al. Integrating a differential evolution feature weighting scheme into prototype generation , 2012, Neurocomputing.

[96] Ivor W. Tsang,et al. Towards ultrahigh dimensional feature selection for big data , 2012, J. Mach. Learn. Res..

[97] Zheng Zhao,et al. Massively parallel feature selection: an approach based on variance preservation , 2012, Machine Learning.

[98] Francisco Herrera,et al. On the choice of the best imputation methods for missing values considering three groups of classification methods , 2012, Knowledge and Information Systems.

[99] Ivor W. Tsang,et al. Discovering Support and Affiliated Features from Very High Dimensions , 2012, ICML.

[100] Wu Bin,et al. Design and Implementation of Parallel Term Contribution Algorithm Based on Mapreduce Model , 2012, 2012 7th Open Cirrus Summit.

[101] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[102] K. R. Chandran,et al. An enhanced ACO algorithm to select features for text categorization and its parallelization , 2012, Expert Syst. Appl..

[103] Francisco Herrera,et al. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104] N. García-Pedrajas,et al. Scaling up data mining algorithms: review and taxonomy , 2012, Progress in Artificial Intelligence.

[105] Lin Dai,et al. A Discretization Algorithm of Numerical Attributes for Digital Library Evaluation Based on Data Mining Technology , 2011, ICADL.

[106] Francisco Herrera,et al. An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..

[107] Leon Wenliang Zhong,et al. Efficient Sparse Modeling With Automatic Feature Grouping , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[108] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[109] Francisco Herrera,et al. Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification , 2011, Pattern Recognit..

[110] Divyakant Agrawal,et al. Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[111] Ivor W. Tsang,et al. Efficient Multitemplate Learning for Structured Prediction , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[112] Francisco Herrera,et al. IPADE: Iterative Prototype Adjustment for Nearest Neighbor Classification , 2010, IEEE Transactions on Neural Networks.

[113] Yu Guo,et al. Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms , 2010, BMC Bioinformatics.

[114] Francisco Herrera,et al. Stratified prototype selection based on a steady-state memetic algorithm: a study of scalability , 2010, Memetic Comput..

[115] Francisco Herrera,et al. IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule , 2010, Pattern Recognit..

[116] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[117] Nicolás García-Pedrajas,et al. Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts , 2010, Artif. Intell..

[118] Juhnyoung Lee,et al. A view of cloud computing , 2010, CACM.

[119] Aníbal R. Figueiras-Vidal,et al. Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[120] Yen-Liang Chen,et al. A Dynamic Discretization Approach for Constructing Decision Trees with a Continuous Label , 2009, IEEE Transactions on Knowledge and Data Engineering.

[121] Charles Bouveyron,et al. Robust supervised classification with mixture models: Learning from data with uncertain labels , 2009, Pattern Recognit..

[122] Sebastian Nowozin,et al. On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[123] Jesús S. Aguilar-Ruiz,et al. Knowledge discovery from data streams , 2009, Intell. Data Anal..

[124] Nicolás García-Pedrajas,et al. A divide-and-conquer recursive approach for scaling up instance selection algorithms , 2009, Data Mining and Knowledge Discovery.

[125] Xindong Wu,et al. The Top Ten Algorithms in Data Mining , 2009 .

[126] Francesca Odone,et al. Feature selection for high-dimensional data , 2009, Comput. Manag. Sci..

[127] Francisco Herrera,et al. A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..

[128] Feiping Nie,et al. Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[129] Dennis M. Wilkinson,et al. Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[130] H. Bondell,et al. Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[131] Marcel J. T. Reinders,et al. Classification in the presence of class noise using a probabilistic Kernel Fisher method , 2007, Pattern Recognit..

[132] Fabrizio Angiulli,et al. Fast Nearest Neighbor Condensation for Large Data Sets Classification , 2007, IEEE Transactions on Knowledge and Data Engineering.

[133] Hiroshi Motoda,et al. Computational Methods of Feature Selection , 2022 .

[134] Taghi M. Khoshgoftaar,et al. Improving Software Quality Prediction by Noise Filtering Techniques , 2007, Journal of Computer Science and Technology.

[135] Jianqing Fan,et al. High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[136] Kunle Olukotun,et al. Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[137] Masoud Nikravesh,et al. Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[138] João Gama,et al. Discretization from data streams: applications to histograms and data mining , 2006, SAC.

[139] Francisco Herrera,et al. On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining , 2006, Appl. Soft Comput..

[140] Ramón Díaz-Uriarte,et al. Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[141] Naftali Tishby,et al. Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity , 2005, NIPS.

[142] Grigorios Tsoumakas,et al. On the Utility of Incremental Feature Selection for the Classification of Textual Data Streams , 2005, Panhellenic Conference on Informatics.

[143] Francisco Herrera,et al. Stratification for scaling up evolutionary prototype selection , 2005, Pattern Recognit. Lett..

[144] Gene H. Golub,et al. Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[145] Sanjay Ghemawat,et al. MapReduce: simplified data processing on large clusters , 2008, CACM.

[146] Andrew W. Moore,et al. An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[147] Christian Böhm,et al. The k-Nearest Neighbour Join: Turbo Charging the KDD Process , 2004, Knowledge and Information Systems.

[148] P. Royston. Multiple Imputation of Missing Values , 2004 .

[149] Glenn Fung,et al. A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[150] Gustavo E. A. P. A. Batista,et al. A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[151] Taghi M. Khoshgoftaar,et al. Analyzing software measurement data with clustering techniques , 2004, IEEE Intelligent Systems.

[152] Andrei Broder,et al. Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[153] Francisco Herrera,et al. Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[154] Xingquan Zhu,et al. Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[155] Huan Liu,et al. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[156] Fuhui Long,et al. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[157] Anneleen Van Assche,et al. Ensemble Methods for Noise Elimination in Classification Problems , 2003, Multiple Classifier Systems.

[158] Gustavo E. A. P. A. Batista,et al. An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[159] Roberto Alejo,et al. Analysis of new techniques to obtain quality training sets , 2003, Pattern Recognit. Lett..

[160] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[161] Huan Liu,et al. Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[162] Wai Lam,et al. Discovering Useful Concept Prototypes for Classification Based on Filtering and Abstraction , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[163] Srinivasan Parthasarathy,et al. Parallel Incremental 2D-Discretization on Dynamic Datasets , 2002, IPDPS.

[164] James C. Bezdek,et al. Nearest prototype classifier designs: An experimental study , 2001, Int. J. Intell. Syst..

[165] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[166] Charles Elkan,et al. The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[167] T. Schneider. Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[168] Carla E. Brodley,et al. Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[169] Foster J. Provost,et al. A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[170] Robert Gray,et al. A Proportional Hazards Model for the Subdistribution of a Competing Risk , 1999 .

[171] Sabine Loudcher,et al. FUSINTER: A Method for Discretization of Continuous Attributes , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[172] Salvatore J. Stolfo,et al. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.

[173] Pat Langley,et al. Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[174] Ramón López de Mántaras,et al. Proposal and Empirical Comparison of a Parallelizable Distance-Based Discretization Method , 1997, KDD.

[175] Filiberto Pla,et al. Prototype selection for the nearest neighbour rule through proximity graphs , 1997, Pattern Recognit. Lett..

[176] David B. Skalak,et al. Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[177] Roberto Battiti,et al. Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[178] Igor Kononenko,et al. Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[179] Usama M. Fayyad,et al. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[180] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[181] Usama M. Fayyad,et al. On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[182] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[183] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[184] Israel Spiegler,et al. Storage and retrieval considerations of binary data bases , 1985, Inf. Process. Manag..

[185] Jeffrey Scott Vitter,et al. Random sampling with a reservoir , 1985, TOMS.

[186] Douglas Comer,et al. Ubiquitous B-Tree , 1979, CSUR.

[187] Forrest W. Young,et al. Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features , 1977 .

[188] Chin-Liang Chang,et al. Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[189] Jack B. Dennis,et al. First version of a data flow procedure language , 1974, Symposium on Programming.

[190] Mark Michael,et al. Experimental Study of Information Measure and Inter-Intra Class Distance Ratios on Feature Selection and Orderings , 1973, IEEE Trans. Syst. Man Cybern..

[191] Dennis L. Wilson,et al. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[192] H. D. Brunk,et al. The Isotonic Regression Problem and its Dual , 1972 .

[193] Seetha Hari,et al. Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[194] Fabian Hueske,et al. Apache Flink , 2019, Encyclopedia of Big Data Technologies.

[195] Wolfgang Härdle,et al. Handbook of Big Data Analytics , 2018 .

[196] Joy Arulraj,et al. Apache Giraph , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[197] S. R,et al. Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[198] Hing Kai Chan,et al. Recent Development in Big Data Analytics for Business Operations and Risk Management , 2017, IEEE Transactions on Cybernetics.

[199] Soundar R. T. Kumara,et al. Cyber-physical systems in manufacturing , 2016 .

[200] Weiwei Xing,et al. A parallel feature selection method study for text classification , 2016, Neural Computing and Applications.

[201] Francisco Herrera,et al. INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control , 2016, Inf. Fusion.

[202] Shui Yu,et al. Big Data Concepts, Theories, and Applications , 2016, Springer International Publishing.

[203] Verónica Bolón-Canedo,et al. Data discretization: taxonomy and big data challenge , 2016, WIREs Data Mining Knowl. Discov..

[204] E. Sivasankar,et al. Framework for Smart Health: Toward Connected Data from Big Data , 2015 .

[205] Jay Lee,et al. A Cyber-Physical Systems architecture for Industry 4.0-based manufacturing systems , 2015 .

[206] W. B. Roberts,et al. Machine Learning: The High Interest Credit Card of Technical Debt , 2014 .

[207] อนิรุธ สืบสิงห์,et al. Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[208] 李航,et al. A Parallel Oversampling Algorithm Based on NRSBoundary-SMOTE , 2014 .

[209] Fernando Iafrate,et al. A Journey from Big Data to Smart Data , 2014 .

[210] Aixia Guo,et al. Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[211] María José del Jesús,et al. A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets , 2013, Knowl. Based Syst..

[212] Gavin Brown,et al. Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[213] Xu Yulong,et al. A Two Step Parallel Discretization Algorithm Based on Dynamic Clustering , 2012, 2012 International Conference on Computer Science and Electronics Engineering.

[214] Boris Breši. Knowledge Acquisition in Databases , 2012 .

[215] Mohamed Medhat Gaber,et al. Advances in data stream mining , 2012, WIREs Data Mining Knowl. Discov..

[216] Francisco Herrera,et al. A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[217] Sanjay Ghemawat,et al. MapReduce: a flexible data processing tool , 2010, CACM.

[218] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[219] Jeremy Kubica,et al. Parallel Large Scale Feature Selection for Logistic Regression , 2009, SDM.

[220] Geoffrey I. Webb,et al. Discretization for naive-Bayes learning: managing discretization bias and variance , 2008, Machine Learning.

[221] D. Kibler,et al. Instance-based learning algorithms , 2004, Machine Learning.

[222] Mark A. Hall,et al. Correlation-based Feature Selection for Machine Learning , 2003 .

[223] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[224] Dorian Pyle,et al. Data Preparation for Data Mining , 1999 .

[225] R. Agarwal. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[226] Jan van Leeuwen,et al. Interval Heaps , 1993, Comput. J..

[227] Lee-Jen Wei,et al. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. , 1992, Statistics in medicine.

[228] R. Little. A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .

[229] Michael Stonebraker,et al. The Case for Shared Nothing , 1985, HPTS.

[230] I. Tomek. An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[231] C. G. Hilborn,et al. The Condensed Nearest Neighbor Rule , 1967 .

[232] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.