Knowledge-Driven Multidimensional Indexing Structure for Biomedical Media Database Retrieval

Today, biomedical media data are being generated at rates unimaginable only years ago. Content-based retrieval of biomedical media from large databases is becoming increasingly important to clinical, research, and educational communities. In this paper, we present the recently developed entropy balanced statistical (EBS) k-d tree and its applications to biomedical media, including a high-resolution computed tomography (HRCT) lung image database and the first real-time protein tertiary structure search engine. Our index utilizes statistical properties inherent in large-scale biomedical media databases for efficient and accurate searches. By applying concepts from pattern recognition and information theory, the EBS k-d tree is built through top-down decision tree induction. Experimentation shows similarity searches against a protein structure database of 53 363 structures consistently execute in less than 8.14 ms for the top 100 most similar structures. Additionally, we have shown improved retrieval precision over adaptive and statistical k-d trees. Retrieval precision of the EBS k-d tree is 81.6% for content-based retrieval of HRCT lung images and 94.9% at 10% recall for protein structure similarity search. The EBS k-d tree has enormous potential for use in biomedical applications embedded with ground-truth knowledge and multidimensional signatures

[1]  Bing-Yu Chen,et al.  A web-based three-dimensional protein retrieval system by matching visual similarity , 2005, Bioinform..

[2]  Lei Zheng,et al.  Design and analysis of a content-based pathology image retrieval system , 2003, IEEE Transactions on Information Technology in Biomedicine.

[3]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[4]  Chi-Ren Shyu,et al.  A Fast Protein Structure Retrieval System Using Image-Based Distance Matrices and Multidimensional Index , 2004, BIBE.

[5]  Carla E. Brodley,et al.  Using Human Perceptual Categories for Content-Based Retrieval from a Medical Image Database , 2002, Comput. Vis. Image Underst..

[6]  J. S. Milton,et al.  Introduction to Probability and Statistics: Principles and Applications for Engineering and the Comp , 1995 .

[7]  Kannan Ramchandran,et al.  A Region-Based Representation of Images in MARS , 1998, J. VLSI Signal Process..

[8]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[9]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[10]  D. Warburton,et al.  Catalogue of unbalanced chromosome aberrations in man , 2004 .

[11]  Arnold W. M. Smeulders,et al.  Content-Based Image Retrieval , 2004 .

[12]  S. Kim,et al.  Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Electronic Imaging.

[14]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[15]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[16]  Michael Stonebraker,et al.  Chabot: Retrieval from a Relational Database of Images , 1995, Computer.

[17]  Dong Xu,et al.  ProteinDBS: a real-time retrieval system for protein structure comparison , 2004, Nucleic Acids Res..

[18]  Peter B. O'Donovan High-Resolution CT of the Chest , 2001 .

[19]  Joseph A. Wolkan,et al.  Introduction to probability and statistics , 1994 .

[20]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[21]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[22]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[23]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Joshua R. Smith,et al.  Image retrieval evaluation , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[25]  Linda G. Shapiro,et al.  A Flexible Image Database System for Content-Based Retrieval , 1999, Comput. Vis. Image Underst..

[26]  Witold Pedrycz,et al.  Fuzzy clustering with partial supervision , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[27]  Chi-Ren Shyu,et al.  A fast protein structure retrieval system using image-based distance matrices and multidimensional index , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[28]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Other Conferences.

[29]  Nicu Sebe,et al.  The State of the Art in Image and Video Retrieval , 2003, CIVR.

[30]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[31]  Avinash C. Kak,et al.  Interactive Learning of a Multiple-Attribute Hash Table Classifier for Fast Object Recognition , 1995, Comput. Vis. Image Underst..

[32]  James M. Keller,et al.  The possibilistic C-means algorithm: insights and recommendations , 1996, IEEE Trans. Fuzzy Syst..

[33]  Rudolf Hanka,et al.  Histological image retrieval based on semantic content analysis , 2003, IEEE Transactions on Information Technology in Biomedicine.

[34]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[35]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[36]  Jon Louis Bentley,et al.  Multidimensional Binary Search Trees in Database Applications , 1979, IEEE Transactions on Software Engineering.

[37]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Carla E. Brodley,et al.  ASSERT: A Physician-in-the-Loop Content-Based Retrieval System for HRCT Image Databases , 1999, Comput. Vis. Image Underst..

[39]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[40]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[41]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[42]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[43]  Sergios Theodoridis,et al.  Pattern Recognition , 1998, IEEE Trans. Neural Networks.

[44]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..