MAPREDUCE FRAMEWORK FOR ANOMALY DETECTION IN MANUFACTURING DATA MISS SIKANA TANUPABRUNGSUN A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ENGINEERING (COMPUTER ENGINEERING) FACULTY OF ENGINEERING KING MONGKUT’S UNIVERSITY OF TECHNOLOGY THONBURI

Manufacturing data is an important source of knowledge that can be used to enhance the production capability. Detecting of the causes of defects possibly leads to an improvement in the production. However, the production records generally contain an enormous set of features and the number of alert messages is redundant. Thus, it is almost impossible in practice to monitor all features at once. This research proposes the feature reduction framework, which is designed to identify a subset of informative features and the correlation groups. By monitoring fewer features, the number of alert messages can be decreased. In our methodology, manufacturing data are pre-processed and adopted as inputs. Subsequently, the feature selection process is performed by wrapping Genetic Algorithm (GA) with the k-Nearest Neighbor (kNN) classifier. To improve the performance, the proposed technique was parallelized with MapReduce. The results showed that the number of features can be reduced by 49.02% with 83.95% accuracy. In addition, with MapReduce on the cloud, the performance was increased by 17.06 times. The framework result was validated by both statistical method and expert analysts from the manufacturing industry.

[1]  Sarah Jane Delany k-Nearest Neighbour Classifiers , 2007 .

[2]  N. Kamaraj,et al.  Evolving decision tree rule based system for audio stego anomalies detection based on Hausdorff distance statistics , 2010, Inf. Sci..

[3]  Tiranee Achalakul,et al.  Method for failure pattern analysis in disk drive manufacturing , 2011, Int. J. Comput. Integr. Manuf..

[4]  K. Hirasawa,et al.  The intelligent space for the elderly — Implementation of fall detection algorithm , 2012, 2012 Proceedings of SICE Annual Conference (SICE).

[5]  N. R. Sakthivel,et al.  Vibration based fault diagnosis of monoblock centrifugal pump using decision tree , 2010, Expert Syst. Appl..

[6]  Peter Wittek,et al.  Accelerating text mining workloads in a MapReduce-based distributed GPU environment , 2013, J. Parallel Distributed Comput..

[7]  Nathan F. Lepora,et al.  Naive Bayes texture classification applied to whisker data from a moving robot , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[8]  Sanaz Pourdarab,et al.  A New Approach for Labeling the Class of Bank Credit Customers via Classification Method in Data Mining , 2011 .

[9]  Frank van Harmelen,et al.  Scalable Distributed Reasoning Using MapReduce , 2009, SEMWEB.

[10]  Long-Sheng Chen,et al.  Using SVM based method for equipment fault detection in a thermal power plant , 2011, Comput. Ind..

[11]  Kalyanmoy Deb,et al.  Parallelization of binary and real-coded genetic algorithms on GPU using CUDA , 2010, IEEE Congress on Evolutionary Computation.

[12]  Tiranee Achalakul,et al.  Yield improvement analysis with parameter-screening factorials , 2012, Appl. Soft Comput..

[13]  Craig Valli,et al.  A Wrapper-Based Feature Selection for Analysis of Large Data Sets , 2010 .

[14]  Haitao Liu,et al.  An improved KNN text classification algorithm based on density , 2011, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems.

[15]  Sherry Y. Chen,et al.  Identifying user preferences with Wrapper-based Decision Trees , 2011, Expert Syst. Appl..

[16]  Nuwan I. Senaratna,et al.  Genetic Algorithms: The Crossover-Mutation Debate , 2005 .

[17]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[18]  Chi Zhou Fast parallelization of differential evolution algorithm using MapReduce , 2010, GECCO '10.

[19]  Tiranee Achalakul,et al.  Reducing bioinformatics data dimension with ABC-kNN , 2013, Neurocomputing.

[20]  N. R. Raajan,et al.  Speech and Non-Speech Identification and Classification using KNN Algorithm , 2012 .

[21]  Ming-Yang Su,et al.  Feature Weighting and Selection for a Real-Time Network Intrusion Detection System Based on GA with KNN , 2008, ISI Workshops.

[22]  Jimmy J. Lin,et al.  Scaling Populations of a Genetic Algorithm for Job Shop Scheduling Problems Using MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[23]  Tiranee Achalakul,et al.  The intelligent space for the elderly —including activity detection , 2012 .

[24]  Tiranee Achalakul,et al.  The Design of SkyPACS: A High-Performance Mobile Medical Imaging Solution , 2015 .

[25]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.

[26]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[27]  Stan Matwin,et al.  Email classification with co-training , 2011, CASCON.

[28]  Andrew Kusiak,et al.  Data Mining in Manufacturing: A Review , 2006 .