An Introduction to Machine Learning

The field of machine learning has a long history and it has been well researched but it is in recent years that it gained major acceptance as computing power has reached a peak point so to make it useful for many applications in the daily life of humans. This document discusses the basic concepts, models and formulations behind the problems of machine learning, influence maximization, and network diffusion. It also analyzes the selected algorithms that were developed to address the above problems. It offers a high-level view of the requirements, assumptions and complexity associated with these interesting problems, as well as their connection with real-life scenarios. For a more rigorous and in-depth analysis and study the reader is referred to the literature that is cited and listed throughout this document. In particular [1] is an excellent reference for machine learning.

[1]  R. Bellman A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .

[2]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[3]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[4]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[5]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[6]  Irving John Good,et al.  The Estimation of Probabilities: An Essay on Modern Bayesian Methods , 1965 .

[7]  Lawrence J. Fogel,et al.  Artificial Intelligence through Simulated Evolution , 1966 .

[8]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[9]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[10]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[11]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[12]  Thomas M. Cover,et al.  Estimation by the nearest neighbor rule , 1968, IEEE Trans. Inf. Theory.

[13]  Ryszard S. Michalski,et al.  On the Quasi-Minimal Solution of the General Covering Problem , 1969 .

[14]  Martin E. Hellman,et al.  The Nearest Neighbor Classification Rule with a Reject Option , 1970, IEEE Trans. Syst. Sci. Cybern..

[15]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[16]  J. Meditch,et al.  Applied optimal control , 1972, IEEE Transactions on Automatic Control.

[17]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[18]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[19]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[20]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[21]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[22]  Donald Michie,et al.  Expert systems in the micro-electronic age , 1979 .

[23]  M. Narasimha Murty,et al.  A computationally efficient technique for data-clustering , 1980, Pattern Recognit..

[24]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[25]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[26]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[27]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[28]  Earl B. Hunt,et al.  Machine learning: An artificial intelligence approach (vol. 2): R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (Eds.). Los Alton, CA: Morgan Kaufmann, 1986. Pp. x + 738. $39.95 , 1987 .

[29]  George Loizou,et al.  The Nearest Neighbor and the Bayes Error Rates , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[31]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[32]  Stephen Muggleton,et al.  Machine Invention of First Order Predicates by Inverting Resolution , 1988, ML.

[33]  Miroslav Kubat Floating approximation in time-varying knowledge bases , 1989, Pattern Recognit. Lett..

[34]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[35]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[36]  Alan J. Katz,et al.  Robust Classifiers without Robust Features , 1990, Neural Computation.

[37]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[38]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[39]  John Shawe-Taylor,et al.  Bounding Sample Size with the Vapnik-Chervonenkis Dimension , 1993, Discrete Applied Mathematics.

[40]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[41]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[42]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[43]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[44]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[45]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[46]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[47]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[48]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[49]  Ivan Bratko,et al.  Machine Learning and Data Mining; Methods and Applications , 1998 .

[50]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[51]  M. Kubát,et al.  Using the Genetic Algorithm to Reduce the Size of a Nearest-Neighbor Classifier and to Select Relevant Attributes , 2001, International Conference on Machine Learning.

[52]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[53]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[54]  Peter D. Turney Robust Classification with Context-Sensitive Features , 2002, ArXiv.

[55]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[56]  Ben Coppin,et al.  Artificial Intelligence Illuminated , 2004 .

[57]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[58]  Marie desJardins,et al.  Evaluation and selection of biases in machine learning , 1995, Machine Learning.

[59]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[60]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[61]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[62]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[63]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[64]  Gert Pfurtscheller,et al.  AI-based approach to automatic sleep classification , 2004, Biological Cybernetics.

[65]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[66]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[67]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[68]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[69]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[70]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[71]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[72]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[73]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[74]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[75]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[76]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[77]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[78]  Yalin Baştanlar,et al.  Introduction to machine learning. , 2014, Methods in molecular biology.

[79]  Jason Weston,et al.  #TagSpace: Semantic Embeddings from Hashtags , 2014, EMNLP.

[80]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[81]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  A. Hall,et al.  Adaptive Switching Circuits , 2016 .