MEGA---the maximizing expected generalization algorithm for learning complex query concepts

Specifying exact query concepts has become increasingly challenging to end-users. This is because many query concepts (e.g., those for looking up a multimedia object) can be hard to articulate, and articulation can be subjective. In this study, we propose a query-concept learner that learns query criteria through an intelligent sampling process. Our concept learner aims to fulfill two primary design objectives: (1) it has to be expressive in order to model most practical query concepts and (2) it must learn a concept quickly and with a small number of labeled data since online users tend to be too impatient to provide much feedback. To fulfill the first goal, we model query concepts in k-CNF, which can express almost all practical query concepts. To fulfill the second design goal, we propose our maximizing expected generalization algorithm (MEGA), which converges to target concepts quickly by its two complementary steps: sample selection and concept refinement. We also propose a divide-and-conquer method that divides the concept-learning task into G subtasks to achieve speedup. We notice that a task must be divided carefully, or search accuracy may suffer. Through analysis and mining results, we observe that organizing image features in a multiresolution manner, and minimizing intragroup feature correlation, can speed up query-concept learning substantially while maintaining high search accuracy. Through examples, analysis, experiments, and a prototype implementation, we show that MEGA converges to query concepts significantly faster than traditional methods.

[1]  Kriengkrai Porkaew,et al.  Query refinement for multimedia similarity retrieval in MARS , 1999, MULTIMEDIA '99.

[2]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[3]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[4]  L. Breiman Arcing Classifiers , 1998 .

[5]  James Ze Wang,et al.  IRM: integrated region matching for image retrieval , 2000, ACM Multimedia.

[6]  John P. Oakley,et al.  Storage and Retrieval for Image and Video Databases , 1993 .

[7]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[8]  J Allan,et al.  Readings in information retrieval. , 1998 .

[9]  Chi Hau Chen,et al.  Fuzzy logic and neural network handbook , 1996 .

[10]  Edward Y. Chang,et al.  DynDex: a dynamic and non-metric space indexer , 2002, MULTIMEDIA '02.

[11]  M. Kearns Learning Boolean Formulae , 2022 .

[12]  Ronald Fagin,et al.  Incorporating User Preferences in Multimedia Queries , 1997, ICDT.

[13]  Edward Y. Chang,et al.  Mining image features for efficient query processing , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[15]  Edward Y. Chang,et al.  Clustering for Approximate Similarity Search in High-Dimensional Spaces , 2002, IEEE Trans. Knowl. Data Eng..

[16]  Shih-Fu Chang,et al.  Image and video search engine for the World Wide Web , 1997, Electronic Imaging.

[17]  Christos Faloutsos,et al.  MindReader: Querying Databases Through Multiple Examples , 1998, VLDB.

[18]  Sharad Mehrotra,et al.  Query reformulation for content based multimedia retrieval in MARS , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[19]  Thomas S. Huang,et al.  Optimizing learning in image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[20]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[21]  Andrei Voronkov,et al.  Handbook of Automated Reasoning: Volume 1 , 2001 .

[22]  Christos Faloutsos,et al.  FALCON: Feedback Adaptive Loop for Content-Based Retrieval , 2000, VLDB.

[23]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[24]  E. Y. Chang,et al.  Toward perception-based image retrieval , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[27]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[28]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[29]  Edward Y. Chang,et al.  PBIR: perception-based image retrieval-a system that can quickly capture subjective image query concepts , 2001, MULTIMEDIA '01.

[30]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[31]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[32]  Ronald Fagin,et al.  A formula for incorporating weights into scoring rules , 2000, Theor. Comput. Sci..

[33]  Ilaria Bartolini,et al.  FeedbackBypass: A New Approach to Interactive Similarity Query Processing , 2001, VLDB.

[34]  Edward Y. Chang,et al.  PBIR - perception-based image retrieval , 2001, SIGMOD '01.

[35]  Nachum Dershowitz,et al.  In handbook of automated reasoning , 2001 .

[36]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[37]  Petra Perner,et al.  A comparison between neural networks and decision trees based on data from industrial radiographic testing , 2001, Pattern Recognit. Lett..

[38]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[39]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[40]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[41]  Ronald Fagin,et al.  Fuzzy queries in multimedia database systems , 1998, PODS '98.

[42]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[43]  Hans-Peter Kriegel,et al.  Visual classification: an interactive approach to decision tree construction , 1999, KDD '99.

[44]  Bayya Yegnanarayana,et al.  Unsupervised texture classification using vector quantization and deterministic relaxation neural network , 1997, IEEE Trans. Image Process..

[45]  Thomas S. Huang,et al.  Supporting Ranked Boolean Similarity Queries in MARS , 1998, IEEE Trans. Knowl. Data Eng..

[46]  David E. Goldberg,et al.  Genetic Algorithms, Tournament Selection, and the Effects of Noise , 1995, Complex Syst..

[47]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[48]  Robert E. Schapire,et al.  Theoretical Views of Boosting and Applications , 1999, ALT.

[49]  Alan Robinson,et al.  Handbook of automated reasoning , 2001 .

[50]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[51]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[52]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[53]  Kyuseok Shim,et al.  WALRUS: A Similarity Retrieval Algorithm for Image Databases , 2004, IEEE Trans. Knowl. Data Eng..

[54]  Christos Faloutsos,et al.  Efficient and effective Querying by Image Content , 1994, Journal of Intelligent Information Systems.

[55]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[56]  Leslie G. Valiant,et al.  Learning Boolean formulas , 1994, JACM.