论文信息 - A boosted distance metric: application to content based image retrieval and classification of digitized histopathology

A boosted distance metric: application to content based image retrieval and classification of digitized histopathology

Distance metrics are often used as a way to compare the similarity of two objects, each represented by a set of features in high-dimensional space. The Euclidean metric is a popular distance metric, employed for a variety of applications. Non-Euclidean distance metrics have also been proposed, and the choice of distance metric for any specific application or domain is a non-trivial task. Furthermore, most distance metrics treat each dimension or object feature as having the same relative importance in determining object similarity. In many applications, such as in Content-Based Image Retrieval (CBIR), where images are quantified and then compared according to their image content, it may be beneficial to utilize a similarity metric where features are weighted according to their ability to distinguish between object classes. In the CBIR paradigm, every image is represented as a vector of quantitative feature values derived from the image content, and a similarity measure is applied to determine which of the database images is most similar to the query. In this work, we present a boosted distance metric (BDM), where individual features are weighted according to their discriminatory power, and compare the performance of this metric to 9 other traditional distance metrics in a CBIR system for digital histopathology. We apply our system to three different breast tissue histology cohorts - (1) 54 breast histology studies corresponding to benign and cancerous images, (2) 36 breast cancer studies corresponding to low and high Bloom-Richardson (BR) grades, and (3) 41 breast cancer studies with high and low levels of lymphocytic infiltration. Over all 3 data cohorts, the BDM performs better compared to 9 traditional metrics, with a greater area under the precision-recall curve. In addition, we performed SVM classification using the BDM along with the traditional metrics, and found that the boosted metric achieves a higher classification accuracy (over 96%) in distinguishing between the tissue classes in each of 3 data cohorts considered. The 10 different similarity metrics were also used to generate similarity matrices between all samples in each of the 3 cohorts. For each cohort, each of the 10 similarity matrices were subjected to normalized cuts, resulting in a reduced dimensional representation of the data samples. The BDM resulted in the best discrimination between tissue classes in the reduced embedding space.

[1] J. Sudbø,et al. New Algorithms Based on the Voronoi Diagram Applied in a Pilot Study on Normal Mucosa and Carcinomas , 2000, Analytical cellular pathology : the journal of the European Society for Analytical Cellular Pathology.

[2] David G. Stork,et al. Pattern Classification , 1973 .

[3] G. Bhanot,et al. Manifold learning with graph-based features for identifying extent of lymphocytic infiltration from high grade , HER 2 + breast cancer histology , 2008 .

[4] George Lee,et al. Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5] Fabio A. González,et al. Design of a Medical Image Database with Content-Based Retrieval Capabilities , 2007, PSIVT.

[6] Nicu Sebe,et al. Toward Robust Distance Metric Analysis for Similarity Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7] Kenneth I. Laws,et al. Rapid Texture Identification , 1980, Optics & Photonics.

[8] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[9] Anant Madabhushi,et al. Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[10] Robert E. Schapire,et al. The Boosting Approach to Machine Learning An Overview , 2003 .

[11] Antoine Geissbühler,et al. A Review of Content{Based Image Retrieval Systems in Medical Applications { Clinical Bene(cid:12)ts and Future Directions , 2022 .

[12] M. Yaffe,et al. American Cancer Society Guidelines for Breast Screening with MRI as an Adjunct to Mammography , 2007 .

[13] Limin Luo,et al. Content-based cell pathology image retrieval by combining different features , 2004, SPIE Medical Imaging.

[14] John R. Gilbertson,et al. Evaluation of prostate tumor grades by content-based image retrieval , 1999, Other Conferences.

[15] George Kollios,et al. BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Leslie W Dalton,et al. Histologic Grading of Breast Cancer: Linkage of Patient Outcome with Level of Pathologist Agreement , 2000, Modern Pathology.

[17] H. Bloom,et al. Histological Grading and Prognosis in Breast Cancer , 1957, British Journal of Cancer.

[18] Heung-Kook Choi,et al. Design of the breast carcinoma cell bank system , 2004, Proceedings. 6th International Workshop on Enterprise Networking and Computing in Healthcare Industry - Healthcom 2004 (IEEE Cat. No.04EX842).

[19] Rong Jin,et al. Learning distance metrics for interactive search-assisted diagnosis of mammograms , 2007, SPIE Medical Imaging.