Case Study II: Data Classification using Scalding and Spark

It is important to characterize learning problems depending on type of data they use. knowledge about the data is very important as similar learning techniques can be applied to similar data types. For example, Natural Language Processing and Bio-informatics use very similar tools for strings for natural language text and DNA sequences. The most basic type of data entities are Vectors . For example, an insurance corporation may want a vector of patient details like blood pressure, heart rate, height, weight, cholesterol, smoking status, gender to infer the patients life expectancy. A farmer might be interested in determining the ripeness of the fruit based on a vector of size, weight, spectral data. An electrical engineer may want to find dependency between voltage and current. A search engine might want to a vector of counts which describe the frequency of words.

[1]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[2]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[3]  Yehuda Koren,et al.  Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[4]  Gwen Littlewort,et al.  Recognizing facial expression: machine learning and application to spontaneous behavior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[6]  Eric Brill,et al.  Beyond PageRank: machine learning for static ranking , 2006, WWW '06.

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[8]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[9]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[10]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  David J. Hand,et al.  Statistics in Finance , 2010 .