IBM T.J. Watson Research Center, Multimedia Analytics: Modality Classification and Case-Based Retrieval Tasks of ImageCLEF2012

In this paper we present the modeling strategies that were applied by the IBM T.J. Watson research team to the modality classi- cation and case-based retrieval tasks of ImageCLEF 2012. The primary challenges of this year's medical modality classication task were as follows: 1) the supplied training data was extremely limited, with some categories having as few as 5 positive examples, leaving little room for internal testing, and 2) some modalities appeared to be visually similar. In order to address these challenges, we approached the task from two fronts: 1) we attempted to augment the training data with additional examples of each category, and 2) we experimented with a broad range of modeling strategies and feature extraction techniques. For the case based retrieval task, we employed a semantic similarity approach to measure the relatedness among medical concepts found in the text corpus. We believe the lack of using additional lexical database besides the UMLS-methathesaurus led to poor performance in relation to other approaches.

[1]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[2]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[4]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[5]  Rong Yan,et al.  Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce , 2009, LS-MMRM '09.

[6]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[7]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Laurent Demanet,et al.  Fast Discrete Curvelet Transforms , 2006, Multiscale Model. Simul..

[10]  Thomas Steinke,et al.  ZIB Structure Prediction Pipeline: Composing a Complex Biological Workflow Through Web Services , 2006, Euro-Par.

[11]  B. K. Julsing,et al.  Face Recognition with Local Binary Patterns , 2012 .

[12]  Jiebo Luo,et al.  Heterogeneous feature machines for visual recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Chia-Hua Ho,et al.  Large-scale linear support vector regression , 2012, J. Mach. Learn. Res..

[14]  K. Doi,et al.  Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules. , 2000, AJR. American journal of roentgenology.

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[17]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.