Effective dimensionality reduction in multimedia applications

In multimedia information retrieval, multimedia data such as images and videos are represented as vectors in high-dimensional space. To search these vectors efficiently, a variety of indexing methods have been proposed. However, the performance of these indexing methods degrades dramatically with increasing dimensionality, which is known as the dimensionality curse. To resolve the dimensionality curse, dimensionality reduction methods have been proposed. They map feature vectors in high-dimensional space into vectors in low-dimensional space before the data are indexed. This paper proposes an improvement for the previously proposed dimensionality reduction. The previous method uses the norm and the approximated angle for every subvector. However, more storage space and a number of cosine computations are required because of multiple angle components. In this paper, we propose an alternative method employing a single angle component instead of respective angles for all the subvectors. Because only one angle for every subvector is considered, though the loss of information regarding the original data vector increases, which degrades the performance slightly, we can successfully reduce storage space as well as a number of cosine computations. Finally, we verify the superiority of the proposed approach via extensive experiments with synthetic and real-life data sets.

[1]  Byung-Uk Choi,et al.  An Effective Method for Approximating the Euclidean Distance in High-Dimensional Space , 2006, DEXA.

[2]  Charu C. Aggarwal,et al.  On the effects of dimensionality reduction on high dimensional similarity search , 2001, PODS.

[3]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[4]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[5]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[6]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[7]  Byung-Uk Choi,et al.  Dimensionality Reduction in High-Dimensional Space for Multimedia Information Retrieval , 2007, DEXA.

[8]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[9]  Yun Fu,et al.  Conformal Embedding Analysis with Local Graph Modeling on the Unit Hypersphere , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Shuicheng Yan,et al.  Classification and Feature Extraction by Simplexization , 2008, IEEE Transactions on Information Forensics and Security.

[11]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[12]  Shuicheng Yan,et al.  Correlation Metric for Generalized Feature Extraction , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Christian Böhm,et al.  Fast parallel similarity search in multimedia databases , 1997, SIGMOD '97.

[14]  Ambuj K. Singh,et al.  Dimensionality Reduction for Similarity Searching in Dynamic Databases , 1999, Comput. Vis. Image Underst..

[15]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.