Challenges for Annotating Images for Sense Disambiguation

We describe an unusual data set of thousands of annotated images with interesting sense phenomena. Natural language image sense annotation involves increased semantic complexities compared to disambiguating word senses when annotating text. These issues are discussed and illustrated, including the distinction between word senses and iconographic senses.

[1]  Mary McGee Wood,et al.  A Categorical Annotation Scheme for Emotion in the Linguistic Content of Dialogue , 2004, ADS.

[2]  Petr Sgall,et al.  A MANUAL FOR TECTOGRAMMATICAL TAGGING OF THE PRAGUE DEPENDENCY TREEBANK , 2000 .

[3]  Timothy Baldwin,et al.  Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006 , 2006 .

[4]  Kevin P. Scannell Machine translation for closely related language pairs , 2022 .

[5]  Alon Lavie,et al.  Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario , 2003, TALIP.

[6]  R. Manmatha,et al.  Automatic Image Annotation and Retrieval using CrossMedia Relevance Models , 2003 .

[7]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[8]  Peter F. Patel-Schneider,et al.  Living wiht Classic: When and How to Use a KL-ONE-Like Language , 1991, Principles of Semantic Networks.

[9]  James Pustejovsky,et al.  Merging PropBank, NomBank, TimeBank, Penn Discourse Treebank and Coreference , 2005, FCA@ACL.

[10]  Beatrice Santorini,et al.  The Penn Treebank: An Overview , 2003 .

[11]  David A. Forsyth,et al.  Discriminating Image Senses by Clustering with Multimodal Features , 2006, ACL.

[12]  Philip Resnik,et al.  The Bible as a Parallel Corpus: Annotating the ‘Book of 2000 Tongues’ , 1999, Comput. Humanit..

[13]  Timothy Baldwin,et al.  Road-testing the English Resource Grammar Over the British National Corpus , 2004, LREC.

[14]  John Robert Ross,et al.  Constraints on variables in syntax , 1967 .

[15]  Peter Fankhauser,et al.  Representing SFL-annotated corpus resources. , 2005 .

[16]  H Helbig Syntactic-semantic analysis of natural language by new word-class controlled functional analysis , 1986 .

[17]  Ann Bies,et al.  Bracketing Guidelines for Treebank II Style , 2002 .

[18]  John A. Bateman,et al.  Target Suites for Evaluating the Coverage of Text Generators , 2000, LREC.

[19]  Andy Way,et al.  From Treebank Resources to LFG F-Structures , 2003 .

[20]  Peter G. B. Enser,et al.  Visual image retrieval: seeking the alliance of concept-based and content-based paradigms , 2000, J. Inf. Sci..

[21]  Resemblances between Meaning ⇔ Text Theory and Functional Generative Description , .

[22]  Ŷd Deletions and their reconstruction in tectogrammatical syntactic tagging of very large corpora , 2000 .

[23]  Kevin P. Scannell Automatic thesaurus generation for minority languages: an Irish example , 2003 .

[24]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[25]  Scott McGlashan,et al.  Heads in grammatical theory , 1993 .

[26]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[27]  Kobus Barnard,et al.  Word Sense Disambiguation with Pictures , 2003, Artif. Intell..

[28]  Markéta Lopatková,et al.  Valency Frames of Czech Verbs in VALLEX 1.0 , 2004, FCP@NAACL-HLT.

[29]  Ivana Kruijff-Korbayová,et al.  Multilingual Resource Sharing Across Both Related and Unrelated Languages: An Implemented, Open-Source Framework for Practical Natural Language Generation , 2005 .

[30]  Kuntz Werner,et al.  Issues as Elements of Information Systems , 1970 .

[31]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[32]  Jan Haji Complex Corpus Annotation: The Prague Dependency Treebank , 2005 .

[33]  Eva Hajičová,et al.  Presupposition and allegation revisited , 1984 .

[34]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[35]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[36]  Jaroslav Peregrin Topic–Focus articulation as generalized quantification , 2002 .

[37]  Elizabeth Shriberg,et al.  Meeting Recorder Project: Dialog Act Labeling Guide , 2004 .

[38]  C. M. Sperberg-McQueen,et al.  GODDAG: A Data Structure for Overlapping Hierarchies , 2000, DDEP/PODDP.

[39]  Susanne Burger,et al.  THE ISL MEETING CORPUS: CATEGORICAL FEATURES OF COMMUNICATIVE GROUP INTERACTIONS , 2004 .

[40]  A BatemanJohn Enabling technology for multilingual natural language generation: the KPML development environment , 1997 .

[41]  A. Frank Automatic F-Structure Annotation of Treebank Trees and CFGs extracted from Treebanks , 2003 .

[42]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[43]  Ray Jackendoff,et al.  Semantic Interpretation in Generative Grammar , 1972 .

[44]  Douglas W. Oard,et al.  The surprise language exercises , 2003, TALIP.

[45]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[46]  Janyce Wiebe,et al.  Annotating Attributions and Private States , 2005, FCA@ACL.

[47]  Eric Nichols,et al.  The Hinoki Treebank A Treebank for Text Understanding , 2004, IJCNLP.

[48]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[49]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[50]  Peter G. B. Enser,et al.  Analysis of user need in image archives , 1997, J. Inf. Sci..

[51]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[52]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[53]  Dirk Heylen,et al.  Argument Diagramming of Meeting Conversations , 2005 .

[54]  Julia Hirschberg,et al.  Classifying subject ratings of emotional speech using acoustic features , 2003, INTERSPEECH.

[55]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[56]  Jan Hajic,et al.  Inferencing And Search For An Answer In TIBAQ , 1982, COLING.

[57]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[58]  Sergei Nirenburg,et al.  Universal Grammar and Lexis for Quick Ramp-Up of MT Systems , 1998, ACL.

[59]  Eero Sormunen,et al.  End-User Searching Challenges Indexing Practices in the Digital Newspaper Photo Archive , 2004, Information Retrieval.

[60]  Kees van Deemter,et al.  On Coreferring: Coreference in MUC and Related Annotation Schemes , 2000, CL.

[61]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[62]  C. Fillmore,et al.  Regularity and Idiomaticity in Grammatical Constructions: The Case of Let Alone , 1988 .

[63]  P. Sgall,et al.  Topic-focus articulation, tripartite structures, and semantic content , 1998 .

[64]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[65]  Ron Artstein,et al.  The Reliability of Anaphoric Annotation, Reconsidered: Taking Ambiguity into Account , 2005, FCA@ACL.

[66]  H. Uszkoreit Constraints on order , 1986 .

[67]  Dan Flickinger,et al.  An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG , 2000, LREC.

[68]  Sven Hartrumpf University of Hagen at QA@CLEF 2005: Extending Knowledge and Deepening Linguistic Processing for Question Answering , 2005, CLEF.

[69]  Eva Hajicová,et al.  Deep Syntactic Annotation: Tectogrammatical Representation and Beyond , 2004, FCP@NAACL-HLT.

[70]  Thorsten Brants,et al.  The LinGO Redwoods Treebank: Motivation and Preliminary Applications , 2002, COLING.

[71]  Victor Zue,et al.  Learning the structure of mixed initiative dialogues using a corpus of annotated conversations 1 , 1997, EUROSPEECH.

[72]  Werner Abraham Satzglieder im Deutschen : Vorschläge zur syntaktischen, semantischen und pragmatischen Fundierung , 1982 .

[73]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[74]  Tony McEnery,et al.  Corpus Resources and Minority Language Engineering , 2000, LREC.

[75]  Andreas Eisele,et al.  The DeepThought Core Architecture Framework , 2004, LREC.

[76]  Nianwen Xue,et al.  Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.

[77]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[78]  Martha Palmer,et al.  Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features , 2005, IJCNLP.

[79]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.

[80]  Kiyoaki Shirai Construction of a Word Sense Tagged Corpus for SENSEVAL-2 Japanese Dictionary Task , 2002, LREC.

[81]  Timothy Baldwin,et al.  Reconsidering Language Identification for Written Language Resources , 2006, LREC.

[82]  John A. Bateman,et al.  Enabling technology for multilingual natural language generation: the KPML development environment , 1997, Natural Language Engineering.

[83]  Petr Pajas,et al.  PDT-VALLEX : Creating a Large-coverage Valency Lexicon for Treebank Annotation , 2003 .

[84]  Alon Lavie,et al.  MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules , 2002, Machine Translation.

[85]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[86]  Petr Sgall,et al.  The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects , 1986 .

[87]  Ralph Grishman,et al.  The NomBank Project: An Interim Report , 2004, FCP@NAACL-HLT.

[88]  Andy Way,et al.  Automatic annotation of the Penn-treebank with LFG f-structureinformation , 2002 .

[89]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[90]  Stephan Oepen,et al.  High Precision Treebanking—Blazing Useful Trees Using POS Information , 2005, ACL.

[91]  Nianwen Xue,et al.  Developing Guidelines and Ensuring Consistency for Chinese Text Annotation , 2000, LREC.

[92]  Christopher Cieri,et al.  Linguistic resource creation for research and technology development: A recent experiment , 2003, TALIP.

[93]  Jennifer Spenader,et al.  Research on Language and Computation: Special issue on cross-modular approaches to ellipsis , 2006 .

[94]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..

[95]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[96]  Christopher D. Manning,et al.  LinGO Redwoods A Rich and Dynamic Treebank for HPSG , 2002 .

[97]  Jianfeng Gao,et al.  Chinese Chunking with Another Type of Spec , 2004, SIGHAN@ACL.

[98]  Dragos Stefan Munteanu,et al.  Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[99]  Sven Hartrumpf,et al.  The semantically based computer lexicon HaGenLex. Structure and technological environment , 2003 .

[100]  Hermann Helbig,et al.  Knowledge Representation and the Semantics of Natural Language , 2005, Cognitive Technologies.

[101]  Agnes Lisowska,et al.  Multimodal interface design for the multimodal meeting domain: Preliminary indications from a query , 2003 .

[102]  L. Lamel,et al.  Emotion detection in task-oriented spoken dialogues , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[103]  Shigeaki Amano,et al.  Estimation of mental lexicon size with word familiarity database , 1998, ICSLP.

[104]  Makoto Nagao,et al.  Building a Japanese parsed corpus while improving the parsing system , 1997 .

[105]  Sebastian Shaumyan Applicational Grammar As a Semantic Theory of Natural Language , 1977 .

[106]  Jia-Lin Tsai A Study of Applying BTM Model on the Chinese Chunk Bracketing , 2005, LINC@IJCNLP.

[107]  Nancy Ide,et al.  A Registry of Standard Data Categories for Linguistic Annotation , 2004, LREC.

[108]  Johan Bos Towards Wide-Coverage Semantic Interpretation , 2005 .

[109]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[110]  Christine Thielen,et al.  Ein kleines und erweitertes Tagset fürs Deutsche , 1996 .

[111]  Susanne Burger,et al.  The ISL meeting corpus: the impact of meeting type on speech style , 2002, INTERSPEECH.