Transdisciplinary Analysis of a Corpus of French Newsreels: The ANTRACT Project

The ANTRACT project is a cross-disciplinary apparatus dedicated to the analysis of the French newsreel company Les Actualites Francaises (1945-1969) and its film productions. Founded during the liberation of France, this state-owned company filmed more than 20,000 news reports shown in French cinemas and throughout the world over its 24 years of activity. The project brings together research organizations with a dual historical and technological perspective. ANTRACT’s goal is to study the production process, the film content, the way historical events are represented and the audience reception of Les Actualites Francaises newsreels using innovative AI-based data processing tools developed by partners specialized in image, audio, and text analysis. This article focuses on the data processing apparatus and tools of the project. Automatic content analysis is used to select data, to segment video units and typescript images, and to align them with their archival description. Automatic speech recognition provides a textual representation and natural language processing can extract named entities from the voice-over recording; automatic visual analysis is applied to detect and recognize faces of well-known characters in videos. These multifaceted data can then be queried and explored with the TXM text-mining platform. The results of these automatic analysis processes are feeding the Okapi platform, a client-server software that integrates documentation, information retrieval, and hypermedia capabilities within a single environment based on the Semantic Web standards. The complete corpus of Les Actualites Francaises, enriched with data and metadata, will be made available to the scientific community by the end of the project.

[1]  Textometry on Audiovisual Corpora Experiments with TXM software , 2020 .

[2]  Nathan S. Atkinson Newsreels as Domestic Propaganda: Visual Rhetoric at the Dawn of the Cold War , 2011, Rhetoric and Public Affairs.

[3]  Robert B. Allen,et al.  Collaborative Research in the Digital Humanities , 2014, Electron. Libr..

[4]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[6]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[7]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[8]  U. Bartels Die Wochenschau im Dritten Reich : Entwicklung und Funktion eines Massenmediums unter besonderer Berücksichtigung völkisch-nationaler Inhalte , 2004 .

[9]  Peter Stockinger,et al.  Studio Campus AAR: A Semantic Platform for Analyzing and Publishing Audiovisual Corpuses , 2017 .

[10]  Global News Broadcasting in the Pre-Television Era: A Cross-National Comparative Analysis of World War II Newsreel Coverage , 2018 .

[11]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  R. Pithon,et al.  Les films d'actualite francais de la Grande Guerre , 1997 .

[13]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[14]  Kurt Hornik,et al.  Text Mining Infrastructure in R , 2008 .

[15]  Yiming Wang,et al.  Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[16]  Boris Motik,et al.  OWL 2 Web Ontology Language: structural specification and functional-style syntax , 2008 .

[17]  Tong Zhang,et al.  Fundamentals of Predictive Text Mining , 2010, Texts in Computer Science.

[18]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[19]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[20]  S. Fein New Empire into Old: Making Mexican Newsreels the Cold War Way , 2004 .

[21]  Jean-Pierre Bertin-Maghit,et al.  Clio de 5 à 7 : les actualités filmées de la Libération : archives du futur , 2001 .

[22]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[23]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[24]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[25]  Serge Heiden,et al.  Textometry on Audiovisual Corpora , 2020 .

[26]  Serge Heiden,et al.  The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme , 2010, PACLIC.

[27]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[28]  Ludovic Lebart,et al.  Exploring Textual Data , 1997 .

[29]  Sylvain Meignier,et al.  S4D: Speaker Diarization Toolkit in Python , 2018, INTERSPEECH.

[30]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[31]  Pascale Goetschel,et al.  Faire l'événement, un enjeu des sociétés contemporaines , 2011 .

[32]  Tony McEnery,et al.  Corpus Linguistics: Method, Theory and Practice , 1996 .

[33]  Oliver Christ,et al.  A Modular and Flexible Architecture for an Integrated Corpus Query System , 1994, ArXiv.

[34]  Sarah Maitland Culture in translation: The case of British Pathé News , 2015 .

[35]  Stephen C. Levinson,et al.  MAX PLANCK INSTITUTE FOR PSYCHOUNGUISTICS , 2003 .

[36]  S. Sieber,et al.  Constructions of Cultural Identities in Newsreel Cinema and Television after 1945 , 2016 .

[37]  Jean-Hugues Chenot,et al.  A large-scale audio and video fingerprints-generated database of TV repeated contents , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[38]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Jean-Pierre Bertin-Maghit Une histoire mondiale des cinémas de propagande , 2008 .

[40]  S. Fein Producing the Cold War in Mexico: The Public Limits of Covert Communications , 2020, In from the Cold.

[41]  Charlotte Roueche Collaborative Research in the Digital Humanities , 2011 .