Learning From Data

Machine learning allows computational systems to adaptively improve their performance with experience accumulated from the observed data. Its techniques are widely applied in engineering, science, finance, and commerce. This book is designed for a short course on machine learning. It is a short course, not a hurried course. From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know. We chose the title `learning from data' that faithfully describes what the subject is about, and made it a point to cover the topics in a story-like fashion. Our hope is that the reader can learn all the fundamentals of the subject by reading the book cover to cover. ---- Learning from data has distinct theoretical and practical tracks. In this book, we balance the theoretical and the practical, the mathematical and the heuristic. Our criterion for inclusion is relevance. Theory that establishes the conceptual framework for learning is included, and so are heuristics that impact the performance of real learning systems. ---- Learning from data is a very dynamic field. Some of the hot techniques and theories at times become just fads, and others gain traction and become part of the field. What we have emphasized in this book are the necessary fundamentals that give any student of learning from data a solid foundation, and enable him or her to venture out and explore further techniques and theories, or perhaps to contribute their own. ---- The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the main text for their popular courses on machine learning. The authors also consult extensively with financial and commercial companies on machine learning applications, and have led winning teams in machine learning competitions.

[1]  Malik Magdon-Ismail,et al.  Incorporating Test Inputs into Learning , 1997, NIPS.

[2]  Amir F. Atiya,et al.  Neural Networks for Density Estimation , 1998, NIPS.

[3]  M. Magdon-Ismail,et al.  A control theory formulation for random variate generation , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[4]  Malik Magdon-Ismail,et al.  No Free Lunch for Noise Prediction , 2000, Neural Computation.

[5]  Malik Magdon-Ismail,et al.  Experimental Evaluation of the Height of a Random Set of Points in a d-Dimensional Cube , 2001, ALENEX.

[6]  Malik Magdon-Ismail,et al.  A learning algorithm for string assembly , 2001, BIOKDD.

[7]  Malik Magdon-Ismail,et al.  The equivalent martingale measure: an introduction to pricing using expectations , 2001, IEEE Trans. Neural Networks.

[8]  Malik Magdon-Ismail,et al.  The Multilevel Classification Problem and a Monotonicity Hint , 2002, IDEAL.

[9]  Malik Magdon-Ismail,et al.  Locating Hidden Groups in Communication Networks Using Hidden Markov Models , 2003, ISI.

[10]  Malik Magdon-Ismail,et al.  Pricing the American put using a new class of tight lower bounds , 2003, 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, 2003. Proceedings..

[11]  Costas Busch,et al.  Cake-Cutting Is Not a Piece of Cake , 2003, STACS.

[12]  Malik Magdon-Ismail,et al.  Using a Linear Fit to Determine Monotonicity Directions , 2003, COLT.

[13]  Amir F. Atiya,et al.  The maximum drawdown of the Brownian motion , 2003, 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, 2003. Proceedings..

[14]  Fikret Sivrikaya,et al.  Contention-Free MAC Protocols for Wireless Sensor Networks , 2004, DISC.

[15]  M. Magdon-Ismail,et al.  Linear Time Isotonic and Unimodal Regression in the L 1 and L ∞ Norms , 2004 .

[16]  Marios Mavronicolas,et al.  Near-Optimal Hot-Potato Routing on Trees , 2004, Euro-Par.

[17]  Marios Mavronicolas,et al.  Universal Bufferless Routing , 2004, WAOA.

[18]  Malik Magdon-Ismail,et al.  Identifying Multi-ID Users in Open Forums , 2004, ISI.

[19]  Paul G. Spirakis,et al.  Direct routing: Algorithms and complexity , 2004, Algorithmica.

[20]  Malik Magdon-Ismail,et al.  Discovering Hidden Groups in Communication Networks , 2004, ISI.

[21]  Malik Magdon-Ismail,et al.  Optimal Link Bombs are Uncoordinated , 2005, AIRWeb.

[22]  Costas Busch,et al.  Oblivious routing on geometric networks , 2005, SPAA '05.

[23]  Jeffrey Baumes baumej,et al.  Dynamics of Bridging and Bonding in Social Groups : A Multi-Agent Model , 2005 .

[24]  Malik Magdon-Ismail,et al.  Detecting conversing groups of chatters: a model, algorithms, and tests , 2005, IADIS AC.

[25]  Malik Magdon-Ismail,et al.  SDE: Graph Drawing Using Spectral Distance Embedding , 2005, Graph Drawing.

[26]  Heidi Jo Newberg,et al.  A Probabilistic Approach to Finding Geometric Objects in Spatial Datasets of the Milky Way , 2005, ISMIS.

[27]  Malik Magdon-Ismail,et al.  Finding communities by clustering a graph into overlapping subgraphs , 2005, IADIS AC.

[28]  Costas Busch,et al.  Efficient Bufferless Routing on Leveled Networks , 2005, Euro-Par.

[29]  Malik Magdon-Ismail,et al.  Efficient Identification of Overlapping Communities , 2005, ISI.

[30]  Malik Magdon-Ismail,et al.  Experimental Evaluation of the Greedy and Random Algorithms for Finding Independent Sets in Random Graphs , 2005, WEA.

[31]  Mohammed J. Zaki,et al.  Finding Hidden Group Structure in a Stream of Communications , 2006, ISI.

[32]  Malik Magdon-Ismail,et al.  SSDE: Fast Graph Drawing Using Sampled Spectral Distance Embedding , 2006, Graph Drawing.

[33]  Malik Magdon-Ismail,et al.  Learning Martingale Measures From High Frequency Financial Data to Help Option Pricing , 2006, JCIS.

[34]  M. Francisco Using agent-based modeling to traverse frameworks in theories of the social , 2006 .

[35]  M. Magdon-Ismail Learning Martingale Measures to Price Options , 2006 .

[36]  Petros Drineas,et al.  Distance Matrix Reconstruction from Incomplete Distance Information for Sensor Network Localization , 2006, 2006 3rd Annual IEEE Communications Society on Sensor and Ad Hoc Communications and Networks.

[37]  Malik Magdon-Ismail,et al.  The Impact of Ranker Quality on Rank Aggregation Algorithms: Information vs. Robustness , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[38]  Malik Magdon-Ismail,et al.  Information vs. Robustness in Rank Aggregation: Models, Algorithms and a Statistical Framework for Evaluation , 2007, J. Digit. Inf. Manag..

[39]  Malik Magdon-Ismail,et al.  Discover the power of social and hidden curriculum to decision making: experiments with enron email and movie newsgroups , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[40]  Malik Magdon-Ismail,et al.  Reverse Engineering an Agent-Based Hidden Markov Model for Complex Social Systems , 2007, IDEAL.

[41]  Joseph Sill,et al.  A linear fit gets the correct monotonicity directions , 2007, Machine Learning.

[42]  Malik Magdon-Ismail,et al.  Efficient Optimal Linear Boosting of a Pair of Classifiers , 2007, IEEE Transactions on Neural Networks.

[43]  Malik Magdon-Ismail,et al.  Learning What Makes a Society Tick , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[44]  Marios Mavronicolas,et al.  Efficient Bufferless Packet Switching on Trees and Leveled Networks ∗ , 2005 .

[45]  Fikret Sivrikaya,et al.  Joint problem of power optimal connectivity and coverage in wireless sensor networks , 2007, Wirel. Networks.

[46]  Yingjie Zhou,et al.  Strategies for Cleaning Organizational Emails with an Application to Enron Email Dataset , 2007 .

[47]  Fikret Sivrikaya,et al.  ASAND : Asynchronous Slot Assignment and Neighbor Discovery Protocol for Wireless Networks , 2007 .

[48]  Malik Magdon-Ismail,et al.  Inferring agent dynamics from social communication network , 2007, WebKDD/SNA-KDD '07.

[49]  M. Magdon-Ismail,et al.  Identification of Hidden Groups in Communications ∗ , 2007 .

[50]  Marios Mavronicolas,et al.  Universal Bufferless Packet Switching , 2007, SIAM J. Comput..

[51]  Boleslaw K. Szymanski,et al.  Distributed and Generic Maximum Likelihood Evaluation , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[52]  M. Magdon-Ismail SYSTEMATIC UNDERPREDICTION OF VOLATILITY IN MAXIMUM LIKELIHOOD METHODS , 2007 .

[53]  Malik Magdon-Ismail,et al.  SIGHTS: A Software System for Finding Coalitions and Leaders in a Social Network , 2007, 2007 IEEE Intelligence and Security Informatics.

[54]  Costas Busch,et al.  Optimal Oblivious Path Selection on the Mesh , 2008, IEEE Transactions on Computers.

[55]  Malik Magdon-Ismail,et al.  Discovery, analysis and monitoring of hidden social networks and their evolution , 2008, 2008 IEEE Conference on Technologies for Homeland Security.

[56]  Carlos A. Varela,et al.  Maximum Likelihood Fitting of Tidal Streams with Application to the Sagittarius Dwarf Tidal Tails , 2008, 0805.2121.

[57]  Sanmay Das,et al.  Adapting to a Market Shock: Optimal Sequential Market-Making , 2008, NIPS.

[58]  Malik Magdon-Ismail,et al.  Stable Statistics of the Blogograph , 2009, ISIPS.

[59]  Cindy Hui Micro-Simulation of Diffusion of Warnings , 2008 .

[60]  William A. Wallace,et al.  Communication Dynamics of Blog Networks , 2008, SNAKDD.

[61]  Malik Magdon-Ismail,et al.  A locality model of the evolution of blog networks , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[62]  Malik Magdon-Ismail,et al.  Deterministic Sparse Column Based Matrix Reconstruction via Greedy Approximation of SVD , 2008, ISAAC.

[63]  Malik Magdon-Ismail,et al.  A Generative Model for Statistical Determination of Information Content from Conversation Threads , 2008, ISI Workshops.

[64]  Malik Magdon-Ismail,et al.  Models of Communication Dynamics for Simulation of Information Diffusion , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[65]  Malik Magdon-Ismail,et al.  Learning American English Accents Using Ensemble Learning with GMMs , 2009, 2009 International Conference on Machine Learning and Applications.

[66]  Mohammed J. Zaki,et al.  graphOnt: An ontology based library for conversion from semantic graphs to JUNG , 2009, 2009 IEEE International Conference on Intelligence and Security Informatics.

[67]  Malik Magdon-Ismail,et al.  Stability of individual and group behavior in a blog network , 2009, 2009 IEEE International Conference on Intelligence and Security Informatics.

[68]  Costas Busch,et al.  Atomic routing games on maximum congestion , 2006, Theor. Comput. Sci..

[69]  Malik Magdon-Ismail,et al.  Pricing the American Option Using Reconfigurable Hardware , 2009, 2009 International Conference on Computational Science and Engineering.

[70]  Aris L. Moustakas,et al.  Learning in the presence of noise , 2009, 2009 International Conference on Game Theory for Networks.

[71]  Carlos A. Varela,et al.  Robust Asynchronous Optimization for Volunteer Computing Grids , 2009, 2009 Fifth IEEE International Conference on e-Science.

[72]  M. Magdon-Ismail,et al.  The Impact of Changes in Network Structure on Diffusion of Warnings , 2009 .

[73]  David P. Anderson,et al.  Accelerating the MilkyWay@Home Volunteer Computing Project with GPUs , 2009, PPAM.

[74]  Sanmay Das,et al.  Collective wisdom: information growth in wikis and blogs , 2010, EC '10.

[75]  Boleslaw K. Szymanski,et al.  Evolutionary Algorithms on Volunteer Computing Platforms: The MilkyWay@Home Project , 2010, Parallel and Distributed Computational Intelligence.

[76]  Malik Magdon-Ismail,et al.  A permutation approach to validation * , 2010, Stat. Anal. Data Min..

[77]  Jonathan T. Purnell,et al.  Approximating the Covariance Matrix with Low-rank Perturbations , 2010 .

[78]  Malik Magdon-Ismail,et al.  Measuring Similarity between Sets of Overlapping Clusters , 2010, 2010 IEEE Second International Conference on Social Computing.

[79]  David P. Anderson,et al.  An analysis of massively distributed evolutionary algorithms , 2010, IEEE Congress on Evolutionary Computation.

[80]  Malik Magdon-Ismail,et al.  Agent-based simulation of the diffusion of warnings , 2010, SpringSim.

[81]  Malik Magdon-Ismail,et al.  Permutation Complexity Bound on Out-Sample Error , 2010, NIPS.

[82]  Peter Rossmanith,et al.  Overlapping Communities in Social Networks , 2014, ArXiv.