Evaluating Multimedia and Language Tasks
暂无分享,去创建一个
George Awad | Ian Soboroff | Asad Butt | Keith Curtis | G. Awad | I. Soboroff | A. Butt | Keith Curtis
[1] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[2] Mark D. Smucker,et al. A System for Efficient High-Recall Retrieval , 2018, SIGIR.
[3] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[4] George Awad,et al. Evaluation of automatic video captioning using direct assessment , 2017, PloS one.
[5] Tsvi Kuflik,et al. From Evaluating to Forecasting Performance: How to Turn Information Retrieval, Natural Language Processing and Recommender Systems into Predictive Sciences (Dagstuhl Perspectives Workshop 17442) , 2018, Dagstuhl Manifestos.
[6] Ben Carterette,et al. The effect of assessor error on IR system evaluation , 2010, SIGIR.
[7] Alan F. Smeaton,et al. The scholarly impact of TRECVid (2003-2009) , 2011, J. Assoc. Inf. Sci. Technol..
[8] Emine Yilmaz,et al. Research Frontiers in Information Retrieval Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018) , 2018 .
[9] Jonathan G. Fiscus,et al. TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval , 2019, TRECVID.
[10] José Luis Vicedo González,et al. TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..
[11] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[13] Huizhong Chen,et al. The stanford mobile visual search data set , 2011, MMSys.
[14] Ellen M. Voorhees,et al. On Building Fair and Reusable Test Collections using Bandit Techniques , 2018, CIKM.
[15] Ben Carterette. The Best Published Result is Random: Sequential Testing and its Effect on Reported Effectiveness , 2015, SIGIR.
[16] Ani Nenkova,et al. Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.
[17] John M. Conroy,et al. An Assessment of the Accuracy of Automatic Evaluation in Summarization , 2012, EvalMetrics@NAACL-HLT.
[18] Mark T. Maybury,et al. Automatic Summarization , 2002, Computational Linguistics.
[19] Emine Yilmaz,et al. Estimating average precision when judgments are incomplete , 2007, Knowledge and Information Systems.
[20] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Ellen M. Voorhees,et al. Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.
[22] Marc El-Bèze,et al. Question Answering Evaluation Survey , 2006, LREC.
[23] Paul Over,et al. Building Better Search Engines by Measuring Search Quality , 2014, IT Professional.
[24] Emine Yilmaz,et al. A statistical method for system evaluation using incomplete judgments , 2006, SIGIR.
[25] A. Lommel. Blues for BLEU : Reconsidering the Validity of Reference-Based MT Evaluation , 2016 .
[26] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[27] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[28] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[29] Karen Spärck Jones. Automatic summarising: The state of the art , 2007, Inf. Process. Manag..
[30] Samuel B. Williams,et al. ASSOCIATION FOR COMPUTING MACHINERY , 2000 .
[31] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[32] Christopher Kanan,et al. Challenges and Prospects in Vision and Language Research , 2019, Front. Artif. Intell..
[33] Cyril W. Cleverdon,et al. The significance of the Cranfield tests on index languages , 1991, SIGIR '91.
[34] Wai Lam,et al. Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.
[35] Djoerd Hiemstra,et al. Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002 , 2003, SIGF.
[36] Ellen M. Voorhees,et al. Evaluating Question Answering System Performance , 2008 .
[37] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[38] Hugh Willmott,et al. Challenges and prospects , 2015 .
[39] Charles L. A. Clarke,et al. Reliable information retrieval evaluation with incomplete and biased judgements , 2007, SIGIR.