Advanced document retrieval techniques for patent research

Latent semantic indexing (LSI) can be used in patent searching to overcome drawbacks of Boolean searching and to give more accurate retrieval. LSI combines the vector space model (VSM) of document retrieval with single value decomposition (SVD), using linear algebra techniques to uncover word relationships in the text. Results can be enhanced by using text clustering and tailoring SVD parameters to the specific corpus, in this case, patents, and by employing techniques to address ambiguities in language.

[1]  C. Ding A similarity-based probability model for latent semantic indexing , 1999, SIGIR '99.

[2]  Gert Wanka,et al.  Latent Semantic Indexing for patent documents , 2005 .

[3]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[4]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI) and TREC-2 , 1993, TREC.

[5]  Jing Gao,et al.  Clustered SVD strategies in latent semantic indexing , 2005, Inf. Process. Manag..

[6]  Hongfang Liu,et al.  Research Paper: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS , 2002, J. Am. Medical Informatics Assoc..

[7]  Chris H. Q. Ding,et al.  Term norm distribution and its effects on Latent Semantic Indexing , 2005, Inf. Process. Manag..

[8]  Sophia Ananiadou,et al.  Trucks: a model for automatic multiword term recognition , 2001 .

[9]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI): TREC-3 Report , 1994, TREC.

[10]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[11]  Susan T. Dumais,et al.  LSI meets TREC: A Status Report , 1992, TREC.

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  Donna Harman,et al.  The Second Text Retrieval Conference (TREC-2) , 1995, Inf. Process. Manag..

[14]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[15]  Clifford Behrens,et al.  Telcordia LSI Engine: implementation and scalability issues , 2001, Proceedings Eleventh International Workshop on Research Issues in Data Engineering. Document Management for Data Intensive Business and Scientific Applications. RIDE 2001.

[16]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[17]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[18]  William M. Pottenger,et al.  A Framework for Understanding LSI Performance , 2004 .