A boosted distance metric: application to content based image retrieval and classification of digitized histopathology

Distance metrics are often used as a way to compare the similarity of two objects, each represented by a set of features in high-dimensional space. The Euclidean metric is a popular distance metric, employed for a variety of applications. Non-Euclidean distance metrics have also been proposed, and the choice of distance metric for any specific application or domain is a non-trivial task. Furthermore, most distance metrics treat each dimension or object feature as having the same relative importance in determining object similarity. In many applications, such as in Content-Based Image Retrieval (CBIR), where images are quantified and then compared according to their image content, it may be beneficial to utilize a similarity metric where features are weighted according to their ability to distinguish between object classes. In the CBIR paradigm, every image is represented as a vector of quantitative feature values derived from the image content, and a similarity measure is applied to determine which of the database images is most similar to the query. In this work, we present a boosted distance metric (BDM), where individual features are weighted according to their discriminatory power, and compare the performance of this metric to 9 other traditional distance metrics in a CBIR system for digital histopathology. We apply our system to three different breast tissue histology cohorts - (1) 54 breast histology studies corresponding to benign and cancerous images, (2) 36 breast cancer studies corresponding to low and high Bloom-Richardson (BR) grades, and (3) 41 breast cancer studies with high and low levels of lymphocytic infiltration. Over all 3 data cohorts, the BDM performs better compared to 9 traditional metrics, with a greater area under the precision-recall curve. In addition, we performed SVM classification using the BDM along with the traditional metrics, and found that the boosted metric achieves a higher classification accuracy (over 96%) in distinguishing between the tissue classes in each of 3 data cohorts considered. The 10 different similarity metrics were also used to generate similarity matrices between all samples in each of the 3 cohorts. For each cohort, each of the 10 similarity matrices were subjected to normalized cuts, resulting in a reduced dimensional representation of the data samples. The BDM resulted in the best discrimination between tissue classes in the reduced embedding space.

[1]  J. Sudbø,et al.  New Algorithms Based on the Voronoi Diagram Applied in a Pilot Study on Normal Mucosa and Carcinomas , 2000, Analytical cellular pathology : the journal of the European Society for Analytical Cellular Pathology.

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  G. Bhanot,et al.  Manifold learning with graph-based features for identifying extent of lymphocytic infiltration from high grade , HER 2 + breast cancer histology , 2008 .

[4]  George Lee,et al.  Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Fabio A. González,et al.  Design of a Medical Image Database with Content-Based Retrieval Capabilities , 2007, PSIVT.

[6]  Nicu Sebe,et al.  Toward Robust Distance Metric Analysis for Similarity Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Kenneth I. Laws,et al.  Rapid Texture Identification , 1980, Optics & Photonics.

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Anant Madabhushi,et al.  Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[10]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[11]  Antoine Geissbühler,et al.  A Review of Content{Based Image Retrieval Systems in Medical Applications { Clinical Bene(cid:12)ts and Future Directions , 2022 .

[12]  M. Yaffe,et al.  American Cancer Society Guidelines for Breast Screening with MRI as an Adjunct to Mammography , 2007 .

[13]  Limin Luo,et al.  Content-based cell pathology image retrieval by combining different features , 2004, SPIE Medical Imaging.

[14]  John R. Gilbertson,et al.  Evaluation of prostate tumor grades by content-based image retrieval , 1999, Other Conferences.

[15]  George Kollios,et al.  BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Leslie W Dalton,et al.  Histologic Grading of Breast Cancer: Linkage of Patient Outcome with Level of Pathologist Agreement , 2000, Modern Pathology.

[17]  H. Bloom,et al.  Histological Grading and Prognosis in Breast Cancer , 1957, British Journal of Cancer.

[18]  Heung-Kook Choi,et al.  Design of the breast carcinoma cell bank system , 2004, Proceedings. 6th International Workshop on Enterprise Networking and Computing in Healthcare Industry - Healthcom 2004 (IEEE Cat. No.04EX842).

[19]  Rong Jin,et al.  Learning distance metrics for interactive search-assisted diagnosis of mammograms , 2007, SPIE Medical Imaging.