Structured gradient boosting

The goal of many machine learning problems can be formalized as the creation of a function that can properly classify an input vector, given a set of examples of that function. While this formalism has produced a number of success stories, there are notable situations in which it fails. One such situation arises when the class labels are composed of multiple variables, each of which may be correlated with all or part of the input or output vectors. Such problems, known as structured prediction problems, are common in the fields of information retrieval, computational linguistics, and computer vision, among others. In this dissertation, I will discuss structured prediction problems and some of the previous approaches to solving them. I will then present a new algorithm, structured gradient boosting, that combines strong points of previous approaches while retaining their generality. More specifically, the algorithm will combine some of the notions of margin maximization present in support vector methods with the speed and flexibility of the structured perceptron algorithm. Finally, I will show a number of novel ways in which this algorithm can be applied effectively, highlighting applications in learning by demonstration and music information retrieval.

[1]  Roger B. Dannenberg,et al.  Melody Matching Directly From Audio , 2001 .

[2]  Dan Roth,et al.  Learning and Inference over Constrained Output , 2005, IJCAI.

[3]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[4]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[5]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[6]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[7]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[8]  Daniel J. Levitin,et al.  Memory for musical attributes , 1999 .

[9]  Carlos Alberto Heuser,et al.  Twisting the Metric Space to Achieve Better Metric Trees , 2004, SBBD.

[10]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[11]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[12]  Alan Fern,et al.  Learning for efficient retrieval of structured data with noisy queries , 2007, ICML '07.

[13]  Pierrick Philippe,et al.  NEW QUERY-BY-HUMMING MUSIC RETRIEVAL SYSTEM CONCEPTION AND EVALUATION BASED ON A QUERY NATURE STUDY , 2001 .

[14]  David Heckerman,et al.  Probabilistic similarity networks , 1991, Networks.

[15]  Alan Fern,et al.  Gradient Boosting for Sequence Alignment , 2006, AAAI.

[16]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[17]  Rong Yan,et al.  A Faster Iterative Scaling Algorithm for Conditional Exponential Model , 2003, ICML.

[18]  Lawrence D. Jackel,et al.  Reading handwritten digits: a ZIP code recognition system , 1992, Computer.

[19]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[20]  Ian H. Witten,et al.  Tune Retrieval in the Multimedia Library , 2000, Multimedia Tools and Applications.

[21]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[22]  Adriane Durey,et al.  Melody Spotting Using Hidden Markov Models , 2001, ISMIR.

[23]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[24]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[25]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[26]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[27]  Tomás Skopal,et al.  On Fast Non-metric Similarity Search by Metric Access Methods , 2006, EDBT.

[28]  Ning Hu,et al.  The MUSART Testbed for Query-by-Humming Evaluation , 2004, Computer Music Journal.

[29]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[30]  Youngmoo E. Kim,et al.  Analysis of a Contour-based Representation for Melody , 2000, ISMIR.

[31]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[32]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[33]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[34]  William P. Birmingham,et al.  Johnny Can't Sing: A Comprehensive Error Model for Sung Music Queries , 2002, ISMIR.

[35]  J. Zobel,et al.  Matching Techniques for Large Music Databases , 1999 .

[36]  Masashi Yamamuro,et al.  A practical query-by-humming system for a large music database , 2000, ACM Multimedia.

[37]  W. J. Bowling Scale and Contour : Two Components of a Theory of Memory for Melodies , 2005 .

[38]  William P. Birmingham,et al.  Effectiveness of HMM-based retrieval on large databases , 2003, ISMIR.

[39]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[40]  William P. Birmingham,et al.  Encoding Timing Information for Musical Query Matching , 2002, ISMIR.

[41]  B Gold,et al.  Parallel processing techniques for estimating pitch periods of speech in the time domain. , 1969, The Journal of the Acoustical Society of America.

[42]  Thomas G. Dietterich,et al.  Training conditional random fields via gradient tree boosting , 2004, ICML.

[43]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[44]  Takuichi Nishimura Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming , 2001, ISMIR.

[45]  Jie Wei,et al.  Markov Edit Distance , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[47]  William P. Birmingham,et al.  The dangers of parsimony in query-by-humming applications , 2003, ISMIR.

[48]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[49]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[50]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[51]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[52]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[53]  Philip Bille,et al.  Tree Edit Distance, Alignment Distance and Inclusion , 2003 .

[54]  Ning Hu,et al.  A Probabilistic Model of Melodic Similarity , 2002, ICMC.

[55]  William P. Birmingham,et al.  HMM-based musical query retrieval , 2002, JCDL '02.

[56]  Stefan M. Rüger,et al.  Fractional Distance Measures for Content-Based Image Retrieval , 2005, ECIR.

[57]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[58]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[59]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[60]  Thorsten Joachims,et al.  Learning to Align Sequences: A Maximum-Margin Approach , 2006 .

[61]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[62]  Salvatore J. Stolfo,et al.  Distributed data mining in credit card fraud detection , 1999, IEEE Intell. Syst..

[63]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[64]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[65]  William P. Birmingham,et al.  Modelling error in query-by-humming applications , 2004 .

[66]  Thomas R. Shultz,et al.  Modeling Cognitive Development on Balance Scale Phenomena , 2004, Machine Learning.

[67]  Lie Lu,et al.  A new approach to query by humming in music retrieval , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[68]  Adam Taro Lindsay,et al.  Using contour as a mid-level representation of melody , 1996 .

[69]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[70]  William P. Birmingham,et al.  Name that tune: A pilot study in finding a melody from a sung query , 2004, J. Assoc. Inf. Sci. Technol..

[71]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[72]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[73]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[74]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[75]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[76]  Daniel Marcu,et al.  Practical structured learning techniques for natural language processing , 2006 .

[77]  David Sankoff,et al.  Comparison of musical sequences , 1990, Comput. Humanit..

[78]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[79]  Preeti Rao,et al.  TANSEN: A QUERY-BY-HUMMING BASED MUSIC RETRIEVAL SYSTEM , 2003 .

[80]  Kevin Baker,et al.  Classification of radar returns from the ionosphere using neural networks , 1989 .

[81]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[82]  Preeti Rao,et al.  BUILDING A MELODY RETRIEVAL SYSTEM , 2002 .

[83]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[84]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[85]  Shriprakash Sinha Leaf shape recognition via support vector machines with edit distance kernels , 2004 .

[86]  Steffen Pauws,et al.  CubyHum: a fully operational "query by humming" system , 2002, ISMIR.

[87]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[88]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[89]  Jyri Huopaniemi,et al.  Melodic Resolution in Music Retrieval , 2001 .

[90]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.