Visual Explainability and Robustness through Language

Relatore:  Riccardo Volpi - Naver Labs Europe - France
  martedì 4 giugno 2024 alle ore 13.45 Aula C (solo presenza)

Abstract: In recent years, the vision-and-language paradigm has revolutionized the way we learn and rely on computer vision models. A major drawback of learning visual representation has always been the lack of data: when coupling our vision model with large, pre-trained language models we can partially mitigate these issues by building on large amounts of previously learned information. In this talk, we will discuss how using language can i) broaden the comfort zone of model vision models for tasks such as object detection and classification and ii) improve their interpretability. We will go through the basis of the vision-and-language paradigm, highlight some of its inherent limitations and discuss some innovative solutions, for example to make CLIP-like models robust to arbitrary vocabularies selected by the user.

Vittorio Murino

Referente esterno
Data pubblicazione
18 aprile 2024

Offerta formativa