论文信息 - Cross-lingual C*ST*RD: English access to Hindi information - 字舞流文

Cross-lingual CSTRD: English access to Hindi information

We present C*ST*RD, a cross-language information delivery system that supports cross-language information retrieval, information space visualization and navigation, machine translation, and text summarization of single documents and clusters of documents. C*ST*RD was assembled and trained within 1 month, in the context of DARPA's Surprise Language Exercise, that selected as source a heretofore unstudied language, Hindi. Given the brief time, we could not create deep Hindi capabilities for all the modules, but instead experimented with combining shallow Hindi capabilities, or even English-only modules, into one integrated system. Various possible configurations, with different tradeoffs in processing speed and ease of use, enable the rapid deployment of C*ST*RD to new languages under various conditions.

Anton Leuski | Liang Zhou | Eduard H. Hovy | Chin-Yew Lin | Franz Josef Och | Ulrich Germann | Chin-Yew Lin | E. Hovy | F. Och | Ulrich Germann | Liang Zhou | A. Leuski

[1] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[2] Marti A. Hearst,et al. Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[3] James Allan,et al. Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems , 1998, SIGIR '98.

[4] David Dubin. Document analysis for visualization , 1995, SIGIR '95.

[5] James Allan,et al. Interactive Information Retrieval Using Clustering and Spatial Proximity , 2004, User Modeling and User-Adapted Interaction.

[6] R. Schwartz,et al. Automatic Headline Generation for Newspaper Stories , 2002 .

[7] Kathleen R. McKeown,et al. Columbia multi-document summarization : Approach and evaluation , 2001 .

[8] James Allan,et al. Interactive information organization: techniques and evaluation , 2001 .

[9] Ulrich Germann,et al. Greedy Decoding for Statistical Machine Translation in Almost Linear Time , 2003, NAACL.

[10] Min Song. BiblioMapper: a cluster-based information visualization technique , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[11] Douglas W. Oard,et al. Rapid-response machine translation for unexpected languages , 2003, MTSUMMIT.

[12] Kerry Rodden,et al. Evaluating a visualisation of image similarity as a tool for image browsing , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[13] Liang Zhou,et al. Headline Summarization at ISI , 2003 .

[14] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[15] David R. Karger,et al. Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[16] James Allan,et al. INQUERY Does Battle With TREC-6 , 1997, TREC.

[17] James Allan,et al. Evaluating combinations of ranked lists and visualizations of inter-document similarity , 2001, Inf. Process. Manag..

[18] Edward M. Reingold,et al. Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[19] James Allan,et al. INQUERY at TREC-5 , 1996, TREC.

[20] Matthias Hemmje,et al. LyberWorld—a visualization user interface supporting fulltext retrieval , 1994, SIGIR '94.

[21] W. Bruce Croft,et al. An Evaluation of Techniques for Clustering Search Results , 2005 .

[22] Rong Jin,et al. Title Generation Using a Training Corpus , 2001, CICLing.

[23] Ari Pirkola,et al. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[24] Nicholas J. Belkin,et al. A case for interaction: a study of interactive information retrieval behavior and effectiveness , 1996, CHI.

[25] Gary Marchionini,et al. A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[26] Matthew Chalmers,et al. Bead: explorations in information visualization , 1992, SIGIR '92.

[27] Ulrich Germann. Building a Statistical Machine Translation System from Scratch: How Much Bang for the Buck Can We Expect? , 2001, DDMMT@ACL.

[28] Alexander G. Hauptmann,et al. Headline Generation using a Training Corpus , 2001 .

[29] Alexander Dekhtyar,et al. Information Retrieval , 2018, Lecture Notes in Computer Science.

[30] Anton Leuski,et al. iNeATS: Interactive Multi-Document Summarization , 2003, ACL.

[31] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[32] David R. Karger,et al. Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[33] Anton Leuski,et al. Relevance and reinforcement in interactive browsing , 2000, CIKM '00.

[34] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.

[35] Eduard H. Hovy,et al. Identifying Topics by Position , 1997, ANLP.

[36] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[37] AllanJames,et al. Interactive Information Retrieval Using Clustering and Spatial Proximity , 2004 .

[38] Boris Mirkin,et al. Mathematical Classification and Clustering , 1996 .

[39] Eduard H. Hovy,et al. From Single to Multi-document Summarization , 2002, ACL.

[40] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[41] Stephen E. Robertson,et al. GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[42] James J. Thomas,et al. Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[43] Hermann Ney,et al. Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[44] James Allan,et al. Evaluating a Visual Navigation System for a Digital Library , 1998, ECDL.

[45] Hermann Ney,et al. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[46] Marc R. Ilgen,et al. DEPICT: Documents Evaluated as Pictures. Visualizing information using context vectors and self-organizing maps , 1996, Proceedings IEEE Symposium on Information Visualization '96.

[47] Richard M. Schwartz,et al. Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[48] Ted Dunning,et al. Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[49] Gerald Salton,et al. Automatic text processing , 1988 .

[50] Alfred Kobsa. User Modeling and User-Adapted Interaction , 2005, User Modeling and User-Adapted Interaction.

[51] Gerard Salton,et al. Optimization of relevance feedback weights , 1995, SIGIR '95.

[52] Jade Goldstein-Stewart,et al. Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[53] Peter Willett,et al. Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[54] Hermann Ney,et al. Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.

[55] James Allan,et al. Strategy-based interactive cluster visualization for information retrieval , 2000, International Journal on Digital Libraries.

[56] Paul Over,et al. Intrinsic Evaluation of Generic News Text Summarization Systems , 2003 .

[57] Chin-Yew Lin,et al. From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[58] H. P. Edmundson,et al. New Methods in Automatic Extracting , 1969, JACM.

[59] Robert J. Hendley,et al. Narcissus: visualising information , 1995 .

[60] Anton Leuski,et al. Evaluating document clustering for interactive information retrieval , 2001, CIKM '01.

[61] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[62] Ellen M. Voorhees,et al. The fifth text REtrieval conference (TREC-5) , 1997 .

[63] Eduard H. Hovy,et al. The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[64] Oren Etzioni,et al. Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[65] J. J. Rocchio,et al. Relevance feedback in information retrieval , 1971 .