A Textual-Visual-Entailment-based Unsupervised Algorithm for Cheapfake Detection

The growth of communication has led to misinformation in many different forms. "Cheapfake" is a recently coined term referring to manipulated media generated by non-AI techniques. One of the most prevalent ways to create cheapfakes is by simply altering the context of an image/video using a misleading caption. The "ACMMM 2022 Grand Challenge on Detecting Cheapfakes" has raised the problem of catching the out-of-context misuse to assist fact-checkers, as detecting conflicting image-caption sets helps narrow the search space. To cope with this challenge, we propose a multimodal heuristic method. The proposed method expands the baseline method of the challenge (i.e., COSMOS) with four additional components (i.e., Natural Language Inference, Fabricated Claims Detection, Visual Entailment, and Online Caption Checking) towards overcoming the current weaknesses of the baseline. During runtime, our proposed method has achieved a maximum of 89.1% accuracy on Task 1, which is 7.2% higher than the baseline method, and 73% accuracy on Task 2. The code for our solution is publicly available on Github1 https://github.com/pwnyniche/acmmmcheapfake2022 and the Docker image can be found on DockerHub2 https://hub.docker.com/repository/docker/tqtnk2000/acmmmcheapfakes.