LDA-Based Retrieval Framework for Semantic News Video Retrieval

Topic-based language model has attracted much attention as the propounding of semantic retrieval in recent years. Especially for the ASR text with errors, the topic representation is more reasonable than the exact term representation. Among these models, Latent Dirichlet Allocation(LDA) has been noted for its ability to discover the latent topic structure, and is broadly applied in many text-related tasks. But up to now its application in information retrieval(IR) is still limited to be a supplement to the standard document models, and furthermore, it has been pointed out that directly employing the basic LDA model will hurt retrieval performance. In this paper, we propose a lexicon-guided two-level LDA retrieval framework. It uses the HowNet to guide the first-level LDA model's parameter estimation, and further construct the second-level LDA models based on the first-level's inference results. We use TRECID 2005 ASR collection to evaluate it, and compare it with the vector space model(VSM) and latent semantic Indexing(LSI). Our experiments show the proposed method is very competitive.

[1]  M. Hubert,et al.  Robust classification in high dimensions based on the SIMCA Method , 2005 .

[2]  Alex Pentland,et al.  Beyond eigenfaces: probabilistic matching for face recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[3]  Ata Kabán,et al.  On an equivalence between PLSI and LDA , 2003, SIGIR.

[4]  Svante Wold,et al.  Pattern recognition by means of disjoint principal components models , 1976, Pattern Recognit..

[5]  Jiangwen Deng,et al.  A novel two-layer PCA/MDA scheme for hand posture recognition , 2002, Object recognition supported by user interaction for service robots.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  M. Tico,et al.  Fingerprint classification based on multiple discriminant analysis , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[8]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI) and TREC-2 , 1993, TREC.

[9]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  L. Azzopardi,et al.  Topic based language models for ad hoc information retrieval , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[12]  Yuntao Cui,et al.  Appearance-Based Hand Sign Recognition from Intensity Image Sequences , 2000, Comput. Vis. Image Underst..

[13]  M. Greenacre,et al.  Multiple Correspondence Analysis and Related Methods , 2006 .

[14]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[15]  Peter M. Hooper Reference Point Logistic Classification , 1999 .

[16]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[17]  Tat-Seng Chua,et al.  TRECVID 2005 by NUS PRIS , 2005, TRECVID.

[18]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[19]  Shu-Ching Chen,et al.  Collateral Representative Subspace Projection Modeling for Supervised Classification , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).