Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data
暂无分享,去创建一个
Francisco Herrera | Julián Luengo | Isaac Triguero | Jesús Maillo | Salvador García | Diego García-Gil | S. García | F. Herrera | J. Luengo | I. Triguero | Diego García-Gil | Jesús Maillo | Julián Luengo
[1] Stefan Jähnichen,et al. Towards a taxonomy of standards in smart data , 2015, 2015 IEEE International Conference on Big Data (Big Data).
[2] Jianqing Fan,et al. High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.
[3] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .
[4] Arun Sharma,et al. Scalable machine‐learning algorithms for big data analytics: a comprehensive review , 2016, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..
[5] Fabrizio Angiulli,et al. Fast Nearest Neighbor Condensation for Large Data Sets Classification , 2007, IEEE Transactions on Knowledge and Data Engineering.
[6] Francisco Herrera,et al. A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..
[7] James M. Keller,et al. A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[8] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[9] Vasyl Lytvyn,et al. Smart Data Integration by Goal Driven Ontology Learning , 2016, INNS Conference on Big Data.
[10] James C. Bezdek,et al. Nearest prototype classifier designs: An experimental study , 2001, Int. J. Intell. Syst..
[11] I. Tomek. An Experiment with the Edited Nearest-Neighbor Rule , 1976 .
[12] Robert Ivor John,et al. An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.
[13] Sunil Arya,et al. An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.
[14] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.
[15] Francisco Herrera,et al. IPADE: Iterative Prototype Adjustment for Nearest Neighbor Classification , 2010, IEEE Transactions on Neural Networks.
[16] Gene H. Golub,et al. Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..
[17] Francisco Herrera,et al. Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.
[18] Salvatore J. Stolfo,et al. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.
[19] C. L. Philip Chen,et al. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..
[20] Luis de Marcos,et al. Distributed ReliefF-based feature selection in Spark , 2018, Knowledge and Information Systems.
[21] Aníbal R. Figueiras-Vidal,et al. Pattern classification with missing data: a review , 2010, Neural Computing and Applications.
[22] Verónica Bolón-Canedo,et al. Fast‐mRMR: Fast Minimum Redundancy Maximum Relevance Algorithm for High‐Dimensional Big Data , 2017, Int. J. Intell. Syst..
[23] C. G. Hilborn,et al. The Condensed Nearest Neighbor Rule , 1967 .
[24] Francisco Herrera,et al. A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[25] T. Schneider. Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .
[26] Elisa Bertino,et al. Indexing Techniques for Advanced Database Systems , 1997, The Springer International Series on Advances in Database Systems.
[27] R. Little. A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .
[28] David B. Skalak,et al. Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.
[29] David W. Aha,et al. A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.
[30] Hammou Fadili,et al. Towards an automatic analyze and standardization of unstructured data in the context of big and linked data , 2016, MEDES.
[31] Bernard De Baets,et al. Supervised distance metric learning through maximization of the Jeffrey divergence , 2017, Pattern Recognit..
[32] Han Liu,et al. Challenges of Big Data Analysis. , 2013, National science review.
[33] Francisco Herrera,et al. CNC-NOS: Class noise cleaning by ensemble filtering and noise scoring , 2018, Knowl. Based Syst..
[34] Ho-Hyun Park,et al. Tagging and classifying facial images in cloud environments based on KNN using MapReduce , 2015 .
[35] Anil K. Ghosh,et al. On some transformations of high dimension, low sample size data for nearest neighbor classification , 2015, Machine Learning.
[36] Verónica Bolón-Canedo,et al. Data discretization: taxonomy and big data challenge , 2016, WIREs Data Mining Knowl. Discov..
[37] V. Marx. Biology: The big challenges of big data , 2013, Nature.
[38] Feifei Li,et al. Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.
[39] Francisco Herrera,et al. IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule , 2010, Pattern Recognit..
[40] Francisco Herrera,et al. Fuzzy nearest neighbor algorithms: Taxonomy, experimental analysis and prospects , 2014, Inf. Sci..
[41] E. Sivasankar,et al. Framework for Smart Health: Toward Connected Data from Big Data , 2015 .
[42] Francisco Herrera,et al. An insight into imbalanced Big Data classification: outcomes and challenges , 2017 .
[43] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
[44] Francisco Herrera,et al. Enabling Smart Data: Noise filtering in Big Data classification , 2017, Inf. Sci..
[45] Andrew W. Moore,et al. An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.
[46] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.
[47] Maya R. Gupta,et al. Completely Lazy Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[48] Cheng Soon Ong,et al. Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .
[49] Chin-Liang Chang,et al. Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.
[50] Francisco Herrera,et al. On the choice of the best imputation methods for missing values considering three groups of classification methods , 2012, Knowledge and Information Systems.
[51] María José del Jesús,et al. Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks , 2014, WIREs Data Mining Knowl. Discov..
[52] M. Verleysen,et al. Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.
[53] John L. Casti,et al. A new initial-value method for on-line filtering and estimation (Corresp.) , 1972, IEEE Trans. Inf. Theory.
[54] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.
[55] Francisco Herrera,et al. On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..
[56] Yidi Wang,et al. A new general nearest neighbor classification based on the mutual neighborhood information , 2017, Knowl. Based Syst..
[57] Francisco Herrera,et al. From Big Data to Smart Data with the K-Nearest Neighbours Algorithm , 2016, 2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData).
[58] Francisco Herrera,et al. MRPR: A MapReduce solution for prototype reduction in big data classification , 2015, Neurocomputing.
[59] Sanjay Ghemawat,et al. MapReduce: a flexible data processing tool , 2010, CACM.
[60] Roberto Alejo,et al. Analysis of new techniques to obtain quality training sets , 2003, Pattern Recognit. Lett..
[61] Nitin Narang,et al. Imbalanced big data classification: a distributed implementation of SMOTE , 2018, ICDCN Workshops.
[62] Luc Devroye,et al. Lectures on the Nearest Neighbor Method , 2015 .
[63] Veda C. Storey,et al. Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..
[64] Francisco Herrera,et al. Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce , 2018, Inf. Fusion.
[65] Ameet Talwalkar,et al. MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..
[66] Xin Yao,et al. A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.
[67] María José del Jesús,et al. KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining , 2017, Int. J. Comput. Intell. Syst..
[68] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..
[69] Taghi M. Khoshgoftaar,et al. Analyzing software measurement data with clustering techniques , 2004, IEEE Intelligent Systems.
[70] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.
[71] Francisco Herrera,et al. Big data preprocessing: methods and prospects , 2016 .
[72] Francisco Herrera,et al. kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data , 2017, Knowl. Based Syst..
[73] Gustavo E. A. P. A. Batista,et al. An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..
[74] Sergio Ramírez-Gallego,et al. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach , 2015 .
[75] Álvar Arnaiz-González,et al. MR-DIS: democratic instance selection for big data by MapReduce , 2017, Progress in Artificial Intelligence.
[76] Dennis L. Wilson,et al. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..
[77] Btissam Zerhari. Class noise elimination approach for large datasets based on a combination of classifiers , 2016, 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech).
[78] Fernando Iafrate,et al. A Journey from Big Data to Smart Data , 2014 .
[79] Swagatam Das,et al. A feature weighted penalty based dissimilarity measure for k-nearest neighbor classification with missing features , 2016, Pattern Recognit. Lett..
[80] Filiberto Pla,et al. Prototype selection for the nearest neighbour rule through proximity graphs , 1997, Pattern Recognit. Lett..
[81] Francisco Herrera,et al. Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..
[82] Francisco Herrera,et al. Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification , 2011, Pattern Recognit..
[83] Jiandong Wang,et al. Margin distribution explanation on metric learning for nearest neighbor classification , 2016, Neurocomputing.
[84] Francisco Herrera,et al. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[85] Igor Kononenko,et al. Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.
[86] Xingquan Zhu,et al. Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.
[87] André Carlos Ponce de Leon Ferreira de Carvalho,et al. Effect of label noise in the complexity of classification problems , 2015, Neurocomputing.
[88] Peter E. Hart,et al. The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.
[89] D. Kibler,et al. Instance-based learning algorithms , 2004, Machine Learning.
[90] Hiroshi Motoda,et al. Computational Methods of Feature Selection , 2007 .
[91] Ivor W. Tsang,et al. Towards ultrahigh dimensional feature selection for big data , 2012, J. Mach. Learn. Res..
[92] Francisco Herrera,et al. Exact fuzzy k-nearest neighbor classification for big datasets , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).
[93] Mohsen Guizani,et al. Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications , 2015, IEEE Communications Surveys & Tutorials.
[94] Jeffrey K. Uhlmann,et al. Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..
[95] Naftali Tishby,et al. Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity , 2005, NIPS.