Large-scale Artificial Neural Network: MapReduce-based Deep Learning

Faced with continuously increasing scale of data, original back-propagation neural network based machine learning algorithm presents two non-trivial challenges: huge amount of data makes it difficult to maintain both efficiency and accuracy; redundant data aggravates the system workload. This project is mainly focused on the solution to the issues above, combining deep learning algorithm with cloud computing platform to deal with large-scale data. A MapReduce-based handwriting character recognizer will be designed in this project to verify the efficiency improvement this mechanism will achieve on training and practical large-scale data. Careful discussion and experiment will be developed to illustrate how deep learning algorithm works to train handwritten digits data, how MapReduce is implemented on deep learning neural network, and why this combination accelerates computation. Besides performance, the scalability and robustness will be mentioned in this report as well. Our system comes with two demonstration software that visually illustrates our handwritten digit recognition/encoding application.

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[3]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[4]  David M. Himmelblau,et al.  Process control via artificial neural networks and reinforcement learning , 1992 .

[5]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[6]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[7]  Martin T. Hagan,et al.  Neural network design , 1995 .

[8]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[9]  Eric Johnson,et al.  Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry , 2000, IEEE Trans. Neural Networks Learn. Syst..

[10]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[11]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[12]  Marley M. B. R. Vellasco,et al.  Data Mining Techniques on the Evaluation of Wireless Churn , 2004, ESANN.

[13]  Euiho Suh,et al.  An LTV model and customer segmentation based on customer value: a case study on the wireless telecommunication industry , 2004, Expert Syst. Appl..

[14]  Sundaram Suresh,et al.  Parallel implementation of back-propagation algorithm in networks of workstations , 2005, IEEE Transactions on Parallel and Distributed Systems.

[15]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[16]  Arlo Faria,et al.  MapReduce : Distributed Computing for Machine Learning , 2006 .

[17]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[18]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[19]  D. N. Ranasinghe,et al.  On the Performance of Parallel Neural Network Implementations on Distributed Memory Architectures , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[20]  Hongyan Li,et al.  MapReduce-based Backpropagation Neural Network over large scale mobile data , 2010, 2010 Sixth International Conference on Natural Computation.

[21]  Raouf Boutaba,et al.  Cloud computing: state-of-the-art and research challenges , 2010, Journal of Internet Services and Applications.

[22]  Shirish Tatikonda,et al.  SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[23]  P. Werstein,et al.  Parallelization of a Backpropagation Neural Network on a Cluster Computer , 2022 .