Big Data Preprocessing: Enabling Smart Data

[1]  Francisco Herrera,et al.  Big Data Preprocessing as the Bridge between Big Data and Smart Data: BigDaPSpark and BigDaPFlink Libraries , 2019, IoTBDS.

[2]  Naixue Xiong,et al.  A novel code data dissemination scheme for Internet of Things through mobile vehicle of smart cities , 2019, Future Gener. Comput. Syst..

[3]  Francisco Herrera,et al.  Brightness guided preprocessing for automatic cold steel weapon detection in surveillance videos with deep learning , 2019, Neurocomputing.

[4]  Francisco Herrera,et al.  SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data , 2018, J. Comput. Sci. Technol..

[5]  Francisco Herrera,et al.  Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data , 2018, WIREs Data Mining Knowl. Discov..

[6]  Mario Piattini,et al.  From big data to smart data: a data quality perspective , 2018, EnSEmble@ESEC/SIGSOFT FSE.

[7]  Francisco Herrera,et al.  DPASF: a flink library for streaming data preprocessing , 2018, Big Data Analytics.

[8]  S. García,et al.  Online entropy-based discretization for data streaming classification , 2018, Future generations computer systems.

[9]  Verónica Bolón-Canedo,et al.  An Information Theory-Based Feature Selection Framework for Big Data Under Apache Spark , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10]  Francisco Herrera,et al.  Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce , 2018, Inf. Fusion.

[11]  Francisco Herrera,et al.  Principal Components Analysis Random Discretization Ensemble for Big Data , 2018, Knowl. Based Syst..

[12]  Francisco Herrera,et al.  A distributed evolutionary multivariate discretizer for Big Data processing on Apache Spark , 2018, Swarm Evol. Comput..

[13]  Luis de Marcos,et al.  Distributed ReliefF-based feature selection in Spark , 2018, Knowledge and Information Systems.

[14]  Nitin Narang,et al.  Imbalanced big data classification: a distributed implementation of SMOTE , 2018, ICDCN Workshops.

[15]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[16]  María José del Jesús,et al.  KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining , 2017, Int. J. Comput. Intell. Syst..

[17]  Sergio Ramírez-Gallego,et al.  Nearest Neighbor Classification for High-Speed Big Data Streams Using Spark , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[18]  Md. Zakirul Alam Bhuiyan,et al.  A Survey on Deep Learning in Big Data , 2017, 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC).

[19]  Robert Ivor John,et al.  An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.

[20]  Jun-Hai Zhai,et al.  The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers , 2015, International Journal of Machine Learning and Cybernetics.

[21]  Francisco Herrera,et al.  SMOTE-GPU: Big Data preprocessing on commodity hardware for imbalanced classification , 2017, Progress in Artificial Intelligence.

[22]  Francisco Herrera,et al.  Enabling Smart Data: Noise filtering in Big Data classification , 2017, Inf. Sci..

[23]  Francisco Herrera,et al.  An insight into imbalanced Big Data classification: outcomes and challenges , 2017 .

[24]  Álvar Arnaiz-González,et al.  MR-DIS: democratic instance selection for big data by MapReduce , 2017, Progress in Artificial Intelligence.

[25]  Verónica Bolón-Canedo,et al.  Fast‐mRMR: Fast Minimum Redundancy Maximum Relevance Algorithm for High‐Dimensional Big Data , 2017, Int. J. Intell. Syst..

[26]  Francisco Herrera,et al.  kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data , 2017, Knowl. Based Syst..

[27]  Francisco Herrera,et al.  GPU-SME-kNN: Scalable and memory efficient kNN and lazy learning using GPUs , 2016, Inf. Sci..

[28]  Huan Liu,et al.  Challenges of Feature Selection for Big Data Analytics , 2016, IEEE Intelligent Systems.

[29]  Francisco Herrera,et al.  Big data preprocessing: methods and prospects , 2016 .

[30]  Arun Sharma,et al.  Scalable machine‐learning algorithms for big data analytics: a comprehensive review , 2016, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[31]  Maoguo Gong,et al.  RBoost: Label Noise-Robust Boosting Algorithm Based on a Nonconvex Loss Function and the Numerically Stable Base Learners , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Vasyl Lytvyn,et al.  Smart Data Integration by Goal Driven Ontology Learning , 2016, INNS Conference on Big Data.

[33]  Mark D. McDonnell,et al.  Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[34]  Nadia Essoussi,et al.  A Parallel Implementation of Relief Algorithm Using Mapreduce Paradigm , 2016, ICCCI.

[35]  Juan José Rodríguez Diez,et al.  Instance selection of linear complexity for big data , 2016, Knowl. Based Syst..

[36]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[37]  Francisco Herrera,et al.  Evolutionary undersampling for extremely imbalanced big data classification under apache spark , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[38]  Nilanjan Dey,et al.  A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset , 2016, Comput. Methods Programs Biomed..

[39]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[40]  Bartosz Krawczyk,et al.  GPU-Accelerated Extreme Learning Machines for Imbalanced Data Streams with Concept Drift , 2016, ICCS.

[41]  Jim Austin,et al.  Hadoop neural network for parallel and distributed feature selection , 2016, Neural Networks.

[42]  Francisco Herrera,et al.  Tutorial on practical tips of the most influential data preprocessing algorithms in data mining , 2016, Knowl. Based Syst..

[43]  Francesco Marcelloni,et al.  A MapReduce solution for associative classification of big data , 2016, Inf. Sci..

[44]  Francisco Herrera,et al.  Multivariate Discretization Based on Evolutionary Cut Points Selection for Classification , 2016, IEEE Transactions on Cybernetics.

[45]  C. Giraud-Carrier,et al.  Efficient mining of high-speed uncertain data streams , 2015, Applied Intelligence.

[46]  Santanu Kumar Rath,et al.  Classification of microarray using MapReduce based proximal support vector machine classifier , 2015, Knowl. Based Syst..

[47]  Stefan Jähnichen,et al.  Towards a taxonomy of standards in smart data , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[48]  Sergio Ramírez-Gallego,et al.  Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach , 2015 .

[49]  Francisco Herrera,et al.  ROSEFW-RF: The winner algorithm for the ECBDL'14 big data competition: An extremely imbalanced big data bioinformatics problem , 2015, Knowl. Based Syst..

[50]  Alberto Mozo,et al.  Massively Parallel Unsupervised Feature Selection on Spark , 2015, ADBIS.

[51]  Verónica Bolón-Canedo,et al.  Recent advances and emerging challenges of feature selection in the context of big data , 2015, Knowl. Based Syst..

[52]  Jason J. Jung,et al.  Social big data: Recent achievements and new challenges , 2015, Information Fusion.

[53]  Francisco Herrera,et al.  Analysis of Data Preprocessing Increasing the Oversampling Ratio for Extremely Imbalanced Big Data Classification , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[54]  Sonja Filiposka,et al.  Feature Ranking Based on Information Gain for Large Classification Problems with MapReduce , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[55]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Effect of label noise in the complexity of classification problems , 2015, Neurocomputing.

[56]  Mohsen Guizani,et al.  Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications , 2015, IEEE Communications Surveys & Tutorials.

[57]  Sachin S. Patil,et al.  Enhanced SMOTE algorithm for classification of imbalanced big-data using Random Forest , 2015, 2015 IEEE International Advance Computing Conference (IACC).

[58]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[59]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[60]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[61]  Francisco Herrera,et al.  Evolutionary undersampling for imbalanced big data classification , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[62]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[63]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[64]  Lida Xu,et al.  The internet of things: a survey , 2014, Information Systems Frontiers.

[65]  Francisco Herrera,et al.  MRPR: A MapReduce solution for prototype reduction in big data classification , 2015, Neurocomputing.

[66]  Geoffrey I. Webb Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data , 2014, 2014 IEEE International Conference on Data Mining.

[67]  Fuzhen Zhuang,et al.  Parallel feature selection using positive approximation based on MapReduce , 2014, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[68]  Yong Zhang,et al.  Parallel Implementation of Chi2 Algorithm in MapReduce Framework , 2014, HCC.

[69]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[70]  María José del Jesús,et al.  Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks , 2014, WIREs Data Mining Knowl. Discov..

[71]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[72]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[73]  Ivor W. Tsang,et al.  The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.

[74]  Manesh Dalavi,et al.  Hadoop MapReduce implementation of a novel scheme for term weighting in text categorization , 2014, 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT).

[75]  Zhao Li,et al.  Data intensive parallel feature selection method study , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[76]  Sebastián Ventura,et al.  Scalable CAIM discretization on multiple GPUs using concurrent kernels , 2014, The Journal of Supercomputing.

[77]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[78]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[79]  Alex Pentland,et al.  Big Data and Management , 2014 .

[80]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[81]  A. Bifet,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[82]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[83]  Tiranee Achalakul,et al.  Feature Reduction for Anomaly Detection in Manufacturing with MapReduce GA/kNN , 2013, 2013 International Conference on Parallel and Distributed Systems.

[84]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[85]  Feng Hu,et al.  A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRSBoundary-SMOTE , 2013 .

[86]  Kai Chen,et al.  Differentially private feature selection under MapReduce framework , 2013 .

[87]  Gilles Louppe,et al.  Independent consultant , 2013 .

[88]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[89]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[90]  Mikel Galar,et al.  Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches , 2013, Knowl. Based Syst..

[91]  Javier Pérez-Rodríguez,et al.  A scalable approach to simultaneous evolutionary instance and feature selection , 2013, Inf. Sci..

[92]  Francisco Herrera,et al.  A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[93]  Daniel E. O'Leary,et al.  Artificial Intelligence and Big Data , 2013, IEEE Intelligent Systems.

[94]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[95]  Francisco Herrera,et al.  Integrating a differential evolution feature weighting scheme into prototype generation , 2012, Neurocomputing.

[96]  Ivor W. Tsang,et al.  Towards ultrahigh dimensional feature selection for big data , 2012, J. Mach. Learn. Res..

[97]  Zheng Zhao,et al.  Massively parallel feature selection: an approach based on variance preservation , 2012, Machine Learning.

[98]  Francisco Herrera,et al.  On the choice of the best imputation methods for missing values considering three groups of classification methods , 2012, Knowledge and Information Systems.

[99]  Ivor W. Tsang,et al.  Discovering Support and Affiliated Features from Very High Dimensions , 2012, ICML.

[100]  Wu Bin,et al.  Design and Implementation of Parallel Term Contribution Algorithm Based on Mapreduce Model , 2012, 2012 7th Open Cirrus Summit.

[101]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[102]  K. R. Chandran,et al.  An enhanced ACO algorithm to select features for text categorization and its parallelization , 2012, Expert Syst. Appl..

[103]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104]  N. García-Pedrajas,et al.  Scaling up data mining algorithms: review and taxonomy , 2012, Progress in Artificial Intelligence.

[105]  Lin Dai,et al.  A Discretization Algorithm of Numerical Attributes for Digital Library Evaluation Based on Data Mining Technology , 2011, ICADL.

[106]  Francisco Herrera,et al.  An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..

[107]  Leon Wenliang Zhong,et al.  Efficient Sparse Modeling With Automatic Feature Grouping , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[108]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[109]  Francisco Herrera,et al.  Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification , 2011, Pattern Recognit..

[110]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[111]  Ivor W. Tsang,et al.  Efficient Multitemplate Learning for Structured Prediction , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[112]  Francisco Herrera,et al.  IPADE: Iterative Prototype Adjustment for Nearest Neighbor Classification , 2010, IEEE Transactions on Neural Networks.

[113]  Yu Guo,et al.  Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms , 2010, BMC Bioinformatics.

[114]  Francisco Herrera,et al.  Stratified prototype selection based on a steady-state memetic algorithm: a study of scalability , 2010, Memetic Comput..

[115]  Francisco Herrera,et al.  IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule , 2010, Pattern Recognit..

[116]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[117]  Nicolás García-Pedrajas,et al.  Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts , 2010, Artif. Intell..

[118]  Juhnyoung Lee,et al.  A view of cloud computing , 2010, CACM.

[119]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[120]  Yen-Liang Chen,et al.  A Dynamic Discretization Approach for Constructing Decision Trees with a Continuous Label , 2009, IEEE Transactions on Knowledge and Data Engineering.

[121]  Charles Bouveyron,et al.  Robust supervised classification with mixture models: Learning from data with uncertain labels , 2009, Pattern Recognit..

[122]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[123]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[124]  Nicolás García-Pedrajas,et al.  A divide-and-conquer recursive approach for scaling up instance selection algorithms , 2009, Data Mining and Knowledge Discovery.

[125]  Xindong Wu,et al.  The Top Ten Algorithms in Data Mining , 2009 .

[126]  Francesca Odone,et al.  Feature selection for high-dimensional data , 2009, Comput. Manag. Sci..

[127]  Francisco Herrera,et al.  A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..

[128]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[129]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[130]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[131]  Marcel J. T. Reinders,et al.  Classification in the presence of class noise using a probabilistic Kernel Fisher method , 2007, Pattern Recognit..

[132]  Fabrizio Angiulli,et al.  Fast Nearest Neighbor Condensation for Large Data Sets Classification , 2007, IEEE Transactions on Knowledge and Data Engineering.

[133]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[134]  Taghi M. Khoshgoftaar,et al.  Improving Software Quality Prediction by Noise Filtering Techniques , 2007, Journal of Computer Science and Technology.

[135]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[136]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[137]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[138]  João Gama,et al.  Discretization from data streams: applications to histograms and data mining , 2006, SAC.

[139]  Francisco Herrera,et al.  On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining , 2006, Appl. Soft Comput..

[140]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[141]  Naftali Tishby,et al.  Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity , 2005, NIPS.

[142]  Grigorios Tsoumakas,et al.  On the Utility of Incremental Feature Selection for the Classification of Textual Data Streams , 2005, Panhellenic Conference on Informatics.

[143]  Francisco Herrera,et al.  Stratification for scaling up evolutionary prototype selection , 2005, Pattern Recognit. Lett..

[144]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[145]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[146]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[147]  Christian Böhm,et al.  The k-Nearest Neighbour Join: Turbo Charging the KDD Process , 2004, Knowledge and Information Systems.

[148]  P. Royston Multiple Imputation of Missing Values , 2004 .

[149]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[150]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[151]  Taghi M. Khoshgoftaar,et al.  Analyzing software measurement data with clustering techniques , 2004, IEEE Intelligent Systems.

[152]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[153]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[154]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[155]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[156]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[157]  Anneleen Van Assche,et al.  Ensemble Methods for Noise Elimination in Classification Problems , 2003, Multiple Classifier Systems.

[158]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[159]  Roberto Alejo,et al.  Analysis of new techniques to obtain quality training sets , 2003, Pattern Recognit. Lett..

[160]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[161]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[162]  Wai Lam,et al.  Discovering Useful Concept Prototypes for Classification Based on Filtering and Abstraction , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[163]  Srinivasan Parthasarathy,et al.  Parallel Incremental 2D-Discretization on Dynamic Datasets , 2002, IPDPS.

[164]  James C. Bezdek,et al.  Nearest prototype classifier designs: An experimental study , 2001, Int. J. Intell. Syst..

[165]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[166]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[167]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[168]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[169]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[170]  Robert Gray,et al.  A Proportional Hazards Model for the Subdistribution of a Competing Risk , 1999 .

[171]  Sabine Loudcher,et al.  FUSINTER: A Method for Discretization of Continuous Attributes , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[172]  Salvatore J. Stolfo,et al.  Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.

[173]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[174]  Ramón López de Mántaras,et al.  Proposal and Empirical Comparison of a Parallelizable Distance-Based Discretization Method , 1997, KDD.

[175]  Filiberto Pla,et al.  Prototype selection for the nearest neighbour rule through proximity graphs , 1997, Pattern Recognit. Lett..

[176]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[177]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[178]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[179]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[180]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[181]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[182]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[183]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[184]  Israel Spiegler,et al.  Storage and retrieval considerations of binary data bases , 1985, Inf. Process. Manag..

[185]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[186]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[187]  Forrest W. Young,et al.  Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features , 1977 .

[188]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[189]  Jack B. Dennis,et al.  First version of a data flow procedure language , 1974, Symposium on Programming.

[190]  Mark Michael,et al.  Experimental Study of Information Measure and Inter-Intra Class Distance Ratios on Feature Selection and Orderings , 1973, IEEE Trans. Syst. Man Cybern..

[191]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[192]  H. D. Brunk,et al.  The Isotonic Regression Problem and its Dual , 1972 .

[193]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[194]  Fabian Hueske,et al.  Apache Flink , 2019, Encyclopedia of Big Data Technologies.

[195]  Wolfgang Härdle,et al.  Handbook of Big Data Analytics , 2018 .

[196]  Joy Arulraj,et al.  Apache Giraph , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[197]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[198]  Hing Kai Chan,et al.  Recent Development in Big Data Analytics for Business Operations and Risk Management , 2017, IEEE Transactions on Cybernetics.

[199]  Soundar R. T. Kumara,et al.  Cyber-physical systems in manufacturing , 2016 .

[200]  Weiwei Xing,et al.  A parallel feature selection method study for text classification , 2016, Neural Computing and Applications.

[201]  Francisco Herrera,et al.  INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control , 2016, Inf. Fusion.

[202]  Shui Yu,et al.  Big Data Concepts, Theories, and Applications , 2016, Springer International Publishing.

[203]  Verónica Bolón-Canedo,et al.  Data discretization: taxonomy and big data challenge , 2016, WIREs Data Mining Knowl. Discov..

[204]  E. Sivasankar,et al.  Framework for Smart Health: Toward Connected Data from Big Data , 2015 .

[205]  Jay Lee,et al.  A Cyber-Physical Systems architecture for Industry 4.0-based manufacturing systems , 2015 .

[206]  W. B. Roberts,et al.  Machine Learning: The High Interest Credit Card of Technical Debt , 2014 .

[207]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[208]  李航,et al.  A Parallel Oversampling Algorithm Based on NRSBoundary-SMOTE , 2014 .

[209]  Fernando Iafrate,et al.  A Journey from Big Data to Smart Data , 2014 .

[210]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[211]  María José del Jesús,et al.  A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets , 2013, Knowl. Based Syst..

[212]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[213]  Xu Yulong,et al.  A Two Step Parallel Discretization Algorithm Based on Dynamic Clustering , 2012, 2012 International Conference on Computer Science and Electronics Engineering.

[214]  Boris Breši Knowledge Acquisition in Databases , 2012 .

[215]  Mohamed Medhat Gaber,et al.  Advances in data stream mining , 2012, WIREs Data Mining Knowl. Discov..

[216]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[217]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[218]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[219]  Jeremy Kubica,et al.  Parallel Large Scale Feature Selection for Logistic Regression , 2009, SDM.

[220]  Geoffrey I. Webb,et al.  Discretization for naive-Bayes learning: managing discretization bias and variance , 2008, Machine Learning.

[221]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[222]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[223]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[224]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[225]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[226]  Jan van Leeuwen,et al.  Interval Heaps , 1993, Comput. J..

[227]  Lee-Jen Wei,et al.  The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. , 1992, Statistics in medicine.

[228]  R. Little A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .

[229]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[230]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[231]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[232]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.