Real-time computerized annotation of pictures.

Developing effective methods for automated annotation of digital pictures continues to challenge computer scientists. The capability of annotating pictures by computers can lead to breakthroughs in a wide range of applications, including Web image search, online picture-sharing communities, and scientific experiments. In this work, the authors developed new optimization and estimation techniques to address two fundamental problems in machine learning. These new techniques serve as the basis for the Automatic Linguistic Indexing of Pictures - Real Time (ALIPR) system of fully automatic and high speed annotation for online pictures. In particular, the D2-clustering method, in the same spirit as k-means for vectors, is developed to group objects represented by bags of weighted vectors. Moreover, a generalized mixture modeling technique (kernel smoothing as a special case) for non-vector data is developed using the novel concept of Hypothetical Local Mapping (HLM). ALIPR has been tested by thousands of pictures from an Internet photo-sharing site, unrelated to the source of those pictures used in the training process. Its performance has also been studied at an online demo site where arbitrary users provide pictures of their choices and indicate the correctness of each annotation word. The experimental results show that a single computer processor can suggest annotation terms in real-time and with good accuracy.

[1]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[2]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  D. Freedman,et al.  Some Asymptotic Theory for the Bootstrap , 1981 .

[4]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[5]  B. S. Manjunath,et al.  Cortina: a system for large-scale, content-based web image retrieval , 2004, MULTIMEDIA '04.

[6]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Jiebo Luo,et al.  Probabilistic spatial context models for scene content understanding , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  James Ze Wang,et al.  Learning-based linguistic indexing of pictures with 2--d MHMMs , 2002, MULTIMEDIA '02.

[9]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Tomaso Poggio,et al.  Image Representations for Visual Learning , 1996, Science.

[11]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[12]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[13]  Nuno Vasconcelos,et al.  A multiresolution manifold distance for invariant image similarity , 2005, IEEE Transactions on Multimedia.

[14]  C. Mallows A Note on Asymptotic Joint Normality , 1972 .

[15]  Jingrui He,et al.  Mean version space: a new active learning method for content-based image retrieval , 2004, MIR '04.

[16]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[18]  Wei-Ying Ma,et al.  Learning an image manifold for retrieval , 2004, MULTIMEDIA '04.

[19]  Yixin Chen,et al.  Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..

[20]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[21]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[22]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[23]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[24]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[25]  S. Rachev The Monge–Kantorovich Mass Transference Problem and Its Stochastic Applications , 1985 .

[26]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[27]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[28]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[29]  Carlo Tomasi Learning theory: Past performance and future results , 2004, Nature.

[30]  Alberto Del Bimbo,et al.  Trademark matching and retrieval in sports video databases , 2007, MIR '07.

[31]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[32]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.