Distant viewing and multimodality theory: Prospects and challenges

This article discusses the prospects and challenges of combining multimodality theory with distant viewing, a recent framework proposed in the field of digital humanities. This framework advocates the use of computational methods to enable large-scale analysis of visual and multimodal materials, which must be nevertheless supported by theories that explain how these materials are structured. Multimodality theory is well-positioned to support this effort by providing descriptive schemas that impose structure on the materials under analysis. The field of multimodality research can also benefit from adopting computational methods, which help to achieve the long-term goal of building large multimodal corpora for empirical research. However, despite their immense potential for multimodality research, the use of computational methods warrants caution, because they involve a number of potentially cascading risks that arise from biases inherent to the underlying data and different approaches to the phenomenon of multimodality.

[1]  Matthew Stone,et al.  AI2D-RST: a multimodal corpus of 1000 primary school science diagrams , 2019, Language Resources and Evaluation.

[2]  John A. Bateman,et al.  Semiotically-grounded distant viewing of diagrams: insights from two multimodal corpora , 2021, Digit. Scholarsh. Humanit..

[3]  Otthein Herzog,et al.  Towards next-generation visual archives: image, film and discourse , 2016 .

[4]  Morten Boeriis,et al.  Accelerating semogenesis: an ecosocial approach to photography , 2019, Visual Communication.

[5]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[6]  John A. Bateman,et al.  The Decomposability of Semiotic Modes , 2012 .

[7]  S. Poulsen,et al.  Studying social media as semiotic technology: a social semiotic multimodal framework , 2018, Social Semiotics.

[8]  Theo van Leeuwen,et al.  Reading Images: The Grammar of Visual Design , 1996 .

[9]  Howard Riley Perceptual Modes, Semiotic Codes, Social Mores: A Contribution towards a Social Semiotics of Drawing , 2004 .

[10]  Martin Thomas Making a Virtue of Material Values: Tactical and Strategic Benefits for Scaling Multimodal Analysis , 2019 .

[11]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[12]  Janina Wildfeuer,et al.  Defining units of analysis for the systematic analysis of comics: A discourse-based approach , 2014 .

[13]  G. Hancock,et al.  On the benefits of structural equation modeling for corpus linguists , 2020, Corpus Linguistics and Linguistic Theory.

[14]  C. Jewitt The Routledge handbook of multimodal analysis , 2014 .

[15]  Michele Zappavigna,et al.  Beyond the self: Intersubjectivity and the social semiotic interpretation of the selfie , 2018, New Media Soc..

[16]  G. Kress ‘Partnerships in research’: multimodality and ethnography , 2011 .

[17]  Giovanni Parodi,et al.  Research challenges for corpus cross-linguistics and multimodal texts , 2010 .

[18]  Tuomo Hiippala,et al.  Semi-automated annotation of page-based documents within the Genre and Multimodality framework , 2016, LaTeCH@ACL.

[19]  G. Aiello Inventorizing, situating, transforming: , 2020, Data Visualization in Society.

[20]  John A. Bateman,et al.  Multimodality and Genre: A Foundation for the Systematic Analysis of Multimodal Documents , 2008 .

[21]  Jana Holsanova,et al.  Tracking visual segmentation: connecting semiotic and cognitive perspectives , 2012 .

[22]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  John A. Bateman,et al.  Text and Image , 2014 .

[24]  Mark Turner,et al.  Toward an infrastructure for data-driven multimodal communication research , 2018 .

[25]  Lihe Huang Toward multimodal corpus pragmatics: Rationale, case, and agenda , 2021, Digit. Scholarsh. Humanit..

[26]  Martin Thomas,et al.  Multimodality and media archaeology: Complementary optics for looking at digital stuff? , 2020, Digit. Scholarsh. Humanit..

[27]  Tuomo Hiippala,et al.  A multimodal perspective on data visualization , 2020, Data Visualization in Society.

[28]  V. Heikinheimo,et al.  Exploring human–nature interactions in national parks with social media photographs and computer vision , 2021, Conservation biology : the journal of the Society for Conservation Biology.

[29]  Matthew Stone,et al.  Arrows are the Verbs of Diagrams , 2018, COLING.

[30]  Robert Waller,et al.  Practice-based perspectives on multimodal documents: Corpora vs connoisseurship , 2017 .

[31]  Sean P. Smith Landscapes for “likes”: capitalizing on travel with Instagram , 2019, Social Semiotics.

[32]  John A. Bateman Multimodal analysis of film within the gem framework , 2013 .

[33]  Jonas Kuhn,et al.  Computational text analysis within the Humanities: How to combine working practices from the contributing fields? , 2019, Lang. Resour. Evaluation.

[34]  T. Arnold,et al.  Distant viewing: analyzing large visual corpora , 2019, Digital Scholarship in the Humanities.

[35]  Duc-Son Pham,et al.  A Digital Mixed Methods Research Design: Integrating Multimodal Analysis With Data Mining and Information Visualization for Big Data Analytics , 2018 .

[36]  Kay L. O'Halloran,et al.  Multimodal approach to analysing big social and news media data , 2021 .

[37]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Sumin Zhao,et al.  The interplay of (semiotic) technologies and genre: the case of the selfie , 2018, Social Semiotics.

[39]  Praveen K. Paritosh,et al.  “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI , 2021, CHI.

[40]  Aylin Caliskan,et al.  Image Representations Learned With Unsupervised Pre-Training Contain Human-like Biases , 2020, FAccT.

[41]  J. Bateman,et al.  A multimodal discourse theory of visual narrative , 2014 .

[42]  Francisco Herrera,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2020, Inf. Fusion.

[43]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[44]  Maurice Nevile,et al.  Organising the soundscape: Participants’ orientation to impending sound when turning on auditory objects in interaction , 2014 .

[45]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[46]  C. Forceville Educating the eye? Kress and Van Leeuwen’s Reading Images: The Grammar of Visual Design (1996) , 1999 .

[47]  Janina Decker,et al.  Reading Images The Grammar Of Visual Design , 2016 .

[48]  Judy Delin,et al.  A framework for corpus-based analysis of the graphic signalling of discourse structure , 2010 .

[49]  Janina Wildfeuer,et al.  Multimodality: Foundations, Research and Analysis – A Problem-Oriented Introduction , 2017 .

[50]  John A. Bateman,et al.  Multimodality and Genre , 2008 .

[51]  Helen Caple Photojournalism: A Social Semiotic Approach , 2013 .

[52]  J. Bateman Information design and multimodality , 2019 .

[53]  Melvin Wevers,et al.  The visual digital turn: Using neural networks to study historical images , 2019, Digit. Scholarsh. Humanit..

[54]  Taylor Arnold,et al.  Distant viewing: analyzing large visual corpora , 2019, Digit. Scholarsh. Humanit..

[55]  S. Adolphs,et al.  Multimodal Corpora , 2020, A Practical Handbook of Corpus Linguistics.

[56]  G. Aiello Theoretical Advances in Critical Visual Analysis: Perception, Ideology, Mythologies, and Social Semiotics , 2006 .

[57]  Erik Malcolm Champion,et al.  Digital humanities is text heavy, visualization light, and simulation poor , 2016, Digit. Scholarsh. Humanit..

[58]  Björn Ommer,et al.  Attesting similarity: Supporting the organization and study of art image collections with computer vision , 2018, Digit. Scholarsh. Humanit..

[59]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..