Overview

Following the successful 2023 iteration, we organise the second Perception Test Challenge with the goal of benchmarking multimodal perception models on the Perception Test (blog, github) - a diagnostic benchmark created by Google DeepMind to comprehensively probe the abilities of multimodal models across:

  • three modalities: video, audio, and text
  • four skill areas: Memory, Abstraction, Physics, Semantics
  • four types of reasoning: Descriptive, Explanatory, Predictive, Counterfactual
  • six computational tasks: multiple-choice video-QA, grounded video-QA, object tracking, point tracking, action localisation, sound localisation

You can try yourself the Perception Test here.

Check the Perception Test github repo for details about the data and annotations format, baselines, and metrics.

Check the Computer Perception workshop at ECCV2022 for recorded talks and slides introducing the Perception Test benchmark.

Check the First Perception Test challenge for details of the previous challenge.

Perception Test overview slides from the 2024 workshop here.

Contact: viorica at google.com, perception-test at google.com