Machine Learning for Information Retrieval

In this thesis, we explore the use of machine learning techniques for information retrieval. More specifically, we focus on ad-hoc retrieval, which is concerned with searching large corpora to identify the documents relevant to user queries. This identification is performed through a ranking task. Given a user query, an ad-hoc retrieval system ranks the corpus documents, so that the documents relevant to the query ideally appear above the others. In a machine learning framework, we are interested in proposing learning algorithms that can benefit from limited training data in order to identify a ranker likely to achieve high retrieval performance over unseen documents and queries. This problem presents novel challenges compared to traditional learning tasks, such as regression or classification. First, our task is a ranking problem, which means that the loss for a given query cannot be measured as a sum of an individual loss suffered for each corpus document. Second, most retrieval queries present a highly unbalanced setup, with a set of relevant documents accounting only for a very small fraction of the corpus. Third, ad-hoc retrieval corresponds to a kind of "double" generalization problem, since the learned model should not only generalize to new documents but also to new queries. Finally, our task also presents challenging efficiency constraints, since ad-hoc retrieval is typically applied to large corpora. The main objective of this thesis is to investigate the discriminative learning of ad-hoc retrieval models. For that purpose, we propose different models based on kernel machines or neural networks adapted to different retrieval contexts. The proposed approaches rely on different online learning algorithms that allow efficient learning over large corpora. The first part of the thesis focuses on text retrieval. In this case, we adopt a classical approach to the retrieval ranking problem, and order the text documents according to their estimated similarity to the text query. The assessment of semantic similarity between text items plays a key role in that setup and we propose a learning approach to identify an effective measure of text similarity. This identification is not performed relying on a set of queries with their corresponding relevant document sets, since such data are especially expensive to label and hence rare. Instead, we propose to rely on hyperlink data, since hyperlinks convey semantic proximity information that is relevant to similarity learning. This setup is hence a transfer learning setup, where we benefit from the proximity information encoded by hyperlinks to improve the performance over the ad-hoc retrieval task. We then investigate another retrieval problem, i.e. the retrieval of images from text queries. Our approach introduces a learning procedure optimizing a criterion related to the ranking performance. This criterion adapts our previous learning objective for learning textual similarity to the image retrieval problem. This yields an image ranking model that addresses the retrieval problem directly. This approach contrasts with previous research that relies on an intermediate image annotation task. Moreover, our learning procedure builds upon recent work on the online learning of kernel-based classifiers. This yields an efficient, scalable algorithm, which can benefit from recent kernels developed for image comparison. In the last part of the thesis, we show that the objective function used in the previous retrieval problems can be applied to the task of keyword spotting, i.e. the detection of given keywords in speech utterances. For that purpose, we formalize this problem as a ranking task: given a keyword, the keyword spotter should order the utterances so that the utterances containing the keyword appear above the others. Interestingly, this formulation yields an objective directly maximizing the area under the receiver operating curve, the most common keyword spotter evaluation measure. This objective is then used to train a model adapted to this intrinsically sequential problem. This model is then learned with a procedure derived from the algorithm previously introduced for the image retrieval task. To conclude, this thesis introduces machine learning approaches for ad-hoc retrieval. We propose learning models for various multi-modal retrieval setups, i.e. the retrieval of text documents from text queries, the retrieval of images from text queries and the retrieval of speech recordings from written keywords. Our approaches rely on discriminative learning and enjoy efficient training procedures, which yields effective and scalable models. In all cases, links with prior approaches were investigated and experimental comparisons were conducted.

[1]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[2]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[3]  Quoc V. Le,et al.  Learning to Rank with Non-Smooth Cost Functions , 2007 .

[4]  Ellen M. Voorhees,et al.  Overview of the TREC 2006 , 2007, TREC.

[5]  Samy Bengio,et al.  A Discriminative Decoder for the Recognition of Phoneme Sequences , 2005 .

[6]  R. Manmatha,et al.  Using Maximum Entropy for Automatic Image Annotation , 2004, CIVR.

[7]  R. Kondor,et al.  Bhattacharyya and Expected Likelihood Kernels , 2003 .

[8]  Christos Faloutsos,et al.  Automatic image captioning , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[14]  Matti Pietikäinen,et al.  Block-Based Methods for Image Retrieval Using Local Binary Patterns , 2005, SCIA.

[15]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[16]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[19]  Hsiao-Wuen Hon,et al.  Large-vocabulary speaker-independent continuous speech recognition using HMM , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[20]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[21]  F. W. Lancaster,et al.  Information retrieval systems; characteristics, testing, and evaluation , 1968 .

[22]  Jean-Philippe Tarel,et al.  Non-Mercer Kernels for SVM Object Recognition , 2004, BMVC.

[23]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[24]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[25]  Emine Yilmaz,et al.  A geometric interpretation and analysis of R-precision , 2005, CIKM '05.

[26]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[27]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[28]  Mitchel Weintraub,et al.  LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[29]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[30]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[31]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[32]  Samy Bengio,et al.  Discriminative kernel-based phoneme sequence recognition , 2006, INTERSPEECH.

[33]  Brian D. Davison Topical locality in the Web , 2000, SIGIR '00.

[34]  Richard Rose,et al.  A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[35]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[36]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[37]  Hervé Bourlard,et al.  Information Retrieval on Noisy Text , 2003 .

[38]  Kenji Kita,et al.  Dimensionality reduction using non-negative matrix factorization for information retrieval , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[39]  Yoram Singer,et al.  An Online Algorithm for Hierarchical Phoneme Classification , 2004, MLMI.

[40]  R. Wohlford,et al.  Keyword recognition using template concatenation , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  J Eichhorn,et al.  Object categorization with SVM: kernels for local features , 2004 .

[42]  Alessandro Vinciarelli Effect of recognition errors on information retrieval performance , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[43]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[44]  Eleazar Eskin,et al.  Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning , 1999, EMNLP.

[45]  Milind R. Naphade On supervision and statistical learning for semantic multimedia analysis , 2004, J. Vis. Commun. Image Represent..

[46]  Edward A. Fox,et al.  Inverted Files , 1992, Information Retrieval: Data Structures & Algorithms.

[47]  Ji Zhu,et al.  Margin Maximizing Loss Functions , 2003, NIPS.

[48]  Li Deng,et al.  Speech Denoising and Dereverberation Using Probabilistic Models , 2000, NIPS.

[49]  Toshiyuki Hanazawa,et al.  Word spotting method based on HMM phoneme recognition , 1988 .

[50]  Eric I-Chao Chang Improving wordspotting performance with limited training data , 1995 .

[51]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[52]  Weblog Wikipedia,et al.  In Wikipedia the Free Encyclopedia , 2005 .

[53]  Hongyuan Zha,et al.  A General Boosting Method and its Application to Learning Ranking Functions for Web Search , 2007, NIPS.

[54]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[55]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[56]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[57]  Harald Höge,et al.  Efficient methods for detecting keywords in continuous speech , 1997, EUROSPEECH.

[58]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[59]  Hugo Zaragoza,et al.  Information Retrieval: Algorithms and Heuristics , 2002, Information Retrieval.

[60]  Baowen Xu,et al.  A constrained non-negative matrix factorization in information retrieval , 2003, Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications.

[61]  Barbara Caputo,et al.  Recognition with local features: the kernel recipe , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[62]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[63]  Alessandro Vinciarelli,et al.  Effect of Recognition Errors on Text Clustering , 2004 .

[64]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[65]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[66]  Samy Bengio,et al.  Links between perceptrons, MLPs and SVMs , 2004, ICML.

[67]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[68]  Samy Bengio,et al.  Exploiting Hyperlinks to Learn a Retrieval Model , 2005, NIPS 2005.

[69]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[70]  Ian H. Witten,et al.  Managing gigabytes , 1994 .

[71]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[72]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[73]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[74]  Thomas Mandl Tolerant Information Retrieval with Backpropagation Networks , 2000, Neural Computing & Applications.

[75]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[76]  Mitch Weintraub,et al.  Neural-network based measures of confidence for word recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[77]  Mehryar Mohri,et al.  Confidence Intervals for the Area Under the ROC Curve , 2004, NIPS.

[78]  Dorothea Blostein,et al.  Handbook of Character Recognition and Document Image Analysis , 1997 .

[79]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[80]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI): TREC-3 Report , 1994, TREC.

[81]  William S. Cooper,et al.  Getting beyond Boole , 1988, Inf. Process. Manag..

[82]  Shih-Fu Chang,et al.  Image retrieval with sketches and compositions , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[83]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[84]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[85]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[86]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[87]  I. Lee Hetherington,et al.  Keyword-based discriminative training of acoustic models , 2000, INTERSPEECH.

[88]  John Shawe-Taylor,et al.  Advanced learning algorithms for cross-language patent retrieval and classification , 2007, Inf. Process. Manag..

[89]  Bernt Schiele,et al.  Natural Scene Retrieval Based on a Semantic Modeling Step , 2004, CIVR.

[90]  G KoldaTamara,et al.  A semidiscrete matrix decomposition for latent semantic indexing information retrieval , 1998 .

[91]  Samy Bengio,et al.  Discriminative keyword spotting , 2009, Speech Commun..

[92]  W. Bruce Croft,et al.  Cross-lingual relevance models , 2002, SIGIR '02.

[93]  Anil K. Jain,et al.  On image classification: city vs. landscape , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[94]  Samy Bengio,et al.  Learning to Retrieve Images from Text Queries with a Discriminative Model , 2006, Adaptive Multimedia Retrieval.

[95]  Ellen M. Voorhees,et al.  Evaluating evaluation measure stability , 2000, SIGIR '00.

[96]  Samy Bengio,et al.  A Multitask Learning Approach to Document Representation using Unlabeled Data , 2006 .

[97]  Samy Bengio,et al.  A Discriminative Approach for the Retrieval of Images from Text Queries , 2006, ECML.

[98]  Shawe-TaylorJohn,et al.  Advanced learning algorithms for cross-language patent retrieval and classification , 2007 .

[99]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[100]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[101]  Thierry Pun,et al.  The Truth about Corel - Evaluation in Image Retrieval , 2002, CIVR.

[102]  Alessandro Vinciarelli,et al.  Effect of segmentation method on video retrieval performance , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[103]  Alexander J. Smola,et al.  Direct Optimization of Ranking Measures , 2007, ArXiv.

[104]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[105]  Samy Bengio,et al.  Inferring document similarity from hyperlinks , 2005, CIKM '05.

[106]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.

[107]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[108]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[109]  Valentin Simeonov,et al.  École polytechnique fédérale de Lausanne (EPFL) , 2018, The Grants Register 2019.

[110]  Chin-Hui Lee,et al.  Utterance verification of keyword strings using word-based minimum verification error (WB-MVE) training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[111]  Donna K. Harman,et al.  Overview of the Ninth Text REtrieval Conference (TREC-9) , 2000, Text Retrieval Conference.

[112]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[113]  Michael Weintraub,et al.  Keyword-spotting using SRI's DECIPHER large-vocabulary speech-recognition system , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[114]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[115]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[116]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[117]  Gérard Chollet,et al.  Confidence measures for keyword spotting using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[118]  Samy Bengio,et al.  A Neural Network to Retrieve Images from Text Queries , 2006, ICANN.

[119]  Gustavo Carneiro,et al.  Formulating semantic image annotation as a supervised learning problem , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[120]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[121]  JainRamesh,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000 .

[122]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[123]  Samy Bengio,et al.  Learning the inter-frame distance for discriminative template-based keyword detection , 2007, INTERSPEECH.

[124]  Ming-Hsuan Yang,et al.  Learning Gender with Support Faces , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[125]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[126]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[127]  Ellen M. Voorhees,et al.  Evaluation by highly relevant documents , 2001, SIGIR '01.

[128]  David Grangier,et al.  A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .

[129]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[130]  Samy Bengio,et al.  A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification , 2005, NIPS.

[131]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[132]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[133]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[134]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[135]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[136]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[137]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[138]  L. D. Wilcox,et al.  A comparison of discrete and continuous hidden Markov models for phrase spotting in text images , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[139]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.