Machine Learning: Data Pre-processing

In prognostics and health management (PHM), data pre‐processing generally involves the following tasks: data cleansing, normalization, feature discovery, and imbalanced data management. Data cleansing is the process of detecting and correcting corrupt or inaccurate data. Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. Feature extraction, also known as dimensionality reduction, is the transformation of high‐dimensional data into a meaningful representation of reduced dimensionality, which should have a dimensionality that corresponds to the intrinsic dimensionality of the data. Linear discriminant analysis (LDA) is commonly used as a dimensionality reduction technique in the data pre‐processing step for classification and machine learning applications. Feature selection, also called variable selection/attribute selection, is the process of selecting a subset of relevant features for use in model construction. The synthetic minority oversampling technique (SMOTE) algorithm produces artificial data based on the feature space similarities between minority data points.

[1]  Xiaohong Su,et al.  Interacting multiple model particle filter for prognostics of lithium-ion batteries , 2017, Microelectron. Reliab..

[2]  M. Pecht,et al.  Anomaly Detection of Polymer Resettable Circuit Protection Devices , 2012, IEEE Transactions on Device and Materials Reliability.

[3]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[4]  Michael G. Pecht,et al.  A prognostics and health management roadmap for information and electronics-rich systems , 2010, Microelectron. Reliab..

[5]  Holly E. Rushmeier,et al.  A Scalable Parallel Algorithm for Self-Organizing Maps with Applications to Sparse Data Mining Problems , 1999, Data Mining and Knowledge Discovery.

[6]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[7]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[8]  Lakshman S. Thakur,et al.  A big data MapReduce framework for fault diagnosis in cloud-based manufacturing , 2016 .

[9]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[10]  Julie Josse,et al.  A principal component method to impute missing values for mixed data , 2013, Adv. Data Anal. Classif..

[11]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[12]  Jay Lee,et al.  A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis , 2011, Expert Syst. Appl..

[13]  Jiang Li,et al.  Isomap and Deep Belief Network-Based Machine Health Combined Assessment Model , 2016 .

[14]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[15]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[16]  Shichao Zhang,et al.  The Journal of Systems and Software , 2012 .

[17]  Myeongsu Kang,et al.  Prognostics-Based LED Qualification Using Similarity-Based Statistical Measure With RVM Regression Model , 2017, IEEE Transactions on Industrial Electronics.

[18]  Hao Tian,et al.  A new feature extraction and selection scheme for hybrid fault diagnosis of gearbox , 2011, Expert Syst. Appl..

[19]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[20]  Adebayo Adeloye,et al.  Replacing outliers and missing values from activated sludge data using kohonen self-organizing map , 2007 .

[21]  Peter W. Tse,et al.  Anomaly Detection Through a Bayesian Support Vector Machine , 2010, IEEE Transactions on Reliability.

[22]  Michael Pecht,et al.  A Bayesian nonlinear random effects model for identification of defective batteries from lot samples , 2017 .

[23]  Jing Tian,et al.  Motor Bearing Fault Detection Using Spectral Kurtosis-Based Feature Extraction Coupled With K-Nearest Neighbor Distance Analysis , 2016, IEEE Transactions on Industrial Electronics.

[24]  Chao Liu,et al.  Global geometric similarity scheme for feature selection in fault diagnosis , 2014, Expert Syst. Appl..