ICLR Injecting large models with new modalities for Video Understanding

Invited Talk
in
Workshop: Multimodal Representation Learning (MRL): Perks and Pitfalls

Injecting large models with new modalities for Video Understanding

Arsha Nagrani

[ Abstract ]

Abstract:

Large models have had an `explosion’ moment recently, achieving state of the art results across various benchmarks and tasks. Here we discuss how they can be adapted to novel vision and audio inputs for multimodal tasks, either by influencing model design, or as frozen components in multimodal architectures. We focus on multimodal video captioning tasks such as ASR and automatic AD for movies, and cover some recently accepted papers at CVPR 2023.

Chat is not available.

Invited Talk in Workshop: Multimodal Representation Learning (MRL): Perks and Pitfalls

Injecting large models with new modalities for Video Understanding

Arsha Nagrani

Invited Talk
in
Workshop: Multimodal Representation Learning (MRL): Perks and Pitfalls