Beyond Scaling: Architecting Adaptive and Collaborative Multimodal Intelligence-Dip.Informatica-Università degli Studi di Verona

Segui su

Relatore: Loris Bazzani - Università degli Studi di Verona

martedì 19 maggio 2026 alle ore 16.30 Sala Verde (presenza ed on line)

Abstract:
The prevailing AI paradigm is approaching a critical juncture where brute-force scaling of general-purpose models yields diminishing returns. As training costs escalate into the hundreds of millions, the industry remains challenged by frozen models that lack adaptability and struggle to deal with the long-tail complexities of real-world applications. This talk proposes a transition toward Adaptive and Collaborative Multimodal Intelligence: systems natively designed to be adaptable in a lightweight manner to environments where data is scarce and restricted and to interact with humans. We will explore three fundamental pillars necessary to bridge the gap between foundational research and industrial applications:

Controllable Multimodal Data Generation & Privacy: active generation to deal with the long tail of rare events and privacy-restricted domains.
Multimodal Adaptation & Specialization: leveraging adaptation techniques to customize models into domain-specific vertical experts.
Human-AI Co-Design: integrating multimodal signals (language, gestures, and spatial clicks) as primary algorithmic constraints to facilitate collaboration between human and AI.

The presentation will broadly review my research of the past few years across academia and industry and defines future directions. I will zoom in on one of my recent works to demonstrate the value of the aforementioned pillars: "Interactive Episodic Memory with User Feedback" (CVPR 2026), which illustrates how integrating interactive memory allows models to better collaborate with humans.

Bio:

Loris Bazzani is an AI Research Leader with over 15 years of experience, spanning classical computer vision and machine learning to today’s foundation and multimodal generative models. He is currently an adjunct professor at the University at Verona. In his previous role as Principal Scientist at Amazon (where he spent almost a decade), he led core research and product efforts across Prime Video, Alexa, and shopping, co-developing architectures for video understanding, vision-language representation, Large Multimodal Models, and diffusion models. His work powered features such as live sports highlights, virtual try-on, interactive product recommendations, and shopping assistants, reaching millions of users and delivering significant business impact. Loris obtained his Ph.D. in Computer Science from the University of Verona (Italy) in 2012, supervised by Prof. Vittorio Murino and Prof. Marco Cristani. He held postdoctoral positions at Dartmouth College with Prof. Lorenzo Torresani, and at the Italian Institute of Technology with Prof. Vittorio Murino. His research has been published in top-tier venues including CVPR, ICCV, ECCV, and ICML, with 50+ publications and patents: https://lorisbaz.github.io/

Link: https://univr.zoom.us/j/81959130316

Meeting ID: 819 5913 0316

Referente: Alessandro Farinelli
Referente esterno
Data pubblicazione: 23 aprile 2026

Strada le Grazie 15
37134 Verona
Partita IVA01541040232
Codice Fiscale93009870234

Segui su

Play store Apple Store

Presentazione

Organizzazione

Riferimenti

La ricerca in breve

Attività di ricerca

Strutture

Corsi di Studio

Dottorati, Master e Formazione superiore

Servizi per la didattica

Informazioni per il territorio

Servizi per il territorio

Riferimenti

Beyond Scaling: Architecting Adaptive and Collaborative Multimodal Intelligence

Offerta formativa

Corsi di Studio

Dottorati, Master e Formazione superiore

Offerta formativa

Corsi di Studio

Dottorati, Master e Formazione superiore