论文信息 - Large-Scale Generic Image Recognition and Image Representation

Large-Scale Generic Image Recognition and Image Representation

近年のインターネットの発達により大量の画像とそれに付随するタグなどの付加的情報が容易に入手可能となり,この大規模な情報を用いて一般画像認識を構築する試みが盛んになってきている.本稿では大規模画像データセットを用いた一般画像認識の潮流を紹介する.また, 大規模画像認識を行うにはスケーラビリティを維持するために線形識別機を用いることが多い.線形の識別機であっても十分な識別能力を発揮するためには画像表現が鍵となるため,近年の画像表現手法に関して解説を行う.

Harada Tatsuya

[1] Pietro Perona,et al. A walk through the web’s video clips , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[3] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4] Florent Perronnin,et al. Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Wei-Ying Ma,et al. AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6] Wen Gao,et al. Towards semantic embedding in visual vocabulary , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7] Zoubin Ghahramani,et al. Learning from labeled and unlabeled data with label propagation , 2002 .

[8] Vladimir Pavlovic,et al. A New Baseline for Image Annotation , 2008, ECCV.

[9] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .

[10] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[11] Hongbin Zha,et al. Optimizing kd-trees for scalable visual descriptor indexing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12] Martial Hebert,et al. Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[13] Jitendra Malik,et al. Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14] Gabriela Csurka,et al. Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[15] Yasuo Kuniyoshi,et al. Image Annotation and Retrieval for Weakly Labeled Images Using Conceptual Learning , 2009, New Generation Computing.

[16] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[17] Eli Shechtman,et al. Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[19] Raimondo Schettini,et al. Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[20] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[21] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[22] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[23] Marcel Worring,et al. Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25] David A. Forsyth,et al. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[26] Michael I. Jordan,et al. Modeling annotated data , 2003, SIGIR.

[27] Michael Isard,et al. Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Thomas S. Huang,et al. Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[29] Yves Grandvalet,et al. Y.: SimpleMKL , 2008 .

[30] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[31] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[32] Hongping Cai,et al. Learning weights for codebook in image classification and retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33] Cor J. Veenman,et al. Kernel Codebooks for Scene Categorization , 2008, ECCV.

[34] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .

[35] Baoxin Li,et al. YouTubeCat: Learning to categorize wild web videos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[37] Yihong Gong,et al. Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[38] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39] Changhu Wang,et al. Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40] Antonio Criminisi,et al. Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[41] Volker Tresp,et al. Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates , 1998, IEEE Trans. Neural Networks.

[42] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43] Jianping Fan,et al. Harvesting large-scale weakly-tagged image databases from the web , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44] Takio Kurita,et al. A New Scheme for Practical Flexible and Intelligent Vision Systems , 1988, MVA.

[45] David G. Lowe,et al. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[46] Yang Yu,et al. Automatic image annotation using group sparsity , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47] Florent Perronnin,et al. Large-scale image categorization with explicit data embedding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48] Luciano Sbaiz,et al. Finding meaning on YouTube: Tag recommendation and category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49] John Shawe-Taylor,et al. Improving "bag-of-keypoints" image categorisation: Generative Models and PDF-Kernels , 2005 .

[50] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[51] Antonio Torralba,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[52] Fei-Fei Li,et al. Towards Scalable Dataset Construction: An Active Learning Approach , 2008, ECCV.

[53] Tyng-Luh Liu,et al. Efficient discriminative local learning for object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[54] Eli Shechtman,et al. In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[55] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[56] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[57] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[58] Luc Van Gool,et al. SURF: Speeded Up Robust Features , 2006, ECCV.

[59] Luc Van Gool,et al. I know what you did last summer: object-level auto-annotation of holiday snaps , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[60] Tat-Seng Chua,et al. Tour the world: Building a web-scale landmark recognition engine , 2009, CVPR.

[61] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[62] Yasuo Kuniyoshi,et al. Canonical contextual distance for large-scale image annotation and retrieval , 2009, LS-MMRM '09.

[63] Hagai Attias,et al. Topic regression multi-modal Latent Dirichlet Allocation for image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[64] Chong-Wah Ngo,et al. Semantic context modeling with maximal margin Conditional Random Fields for automatic image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[65] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[66] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67] Trevor Darrell,et al. Co-training with noisy perceptual observations , 2009, CVPR.

[68] Paul Clough,et al. The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .

[69] Jason Weston,et al. Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[70] Yi Li,et al. ARISTA - image search to annotation on billions of web photos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[71] Cordelia Schmid,et al. TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[72] Kristen Grauman,et al. Watch, Listen & Learn: Co-training on Captioned Images and Videos , 2008, ECML/PKDD.

[73] Wei-Ying Ma,et al. Learning to cluster web search results , 2004, SIGIR '04.

[74] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[75] Jean Ponce,et al. Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[76] Yasuo Kuniyoshi,et al. Evaluation of dimensionality reduction methods for image auto-annotation , 2010, BMVC.

[77] Yasuo Kuniyoshi,et al. Global Gaussian approach for scene categorization using information geometry , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[78] Robert F. Sproull,et al. Refinements to nearest-neighbor searching ink-dimensional trees , 1991, Algorithmica.

[79] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[80] Yang Song,et al. Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[81] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[82] Yasuo Kuniyoshi,et al. Improving Local Descriptors by Embedding Global and Local Spatial Information , 2010, ECCV.

[83] Daniel P. Huttenlocher,et al. Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.