Beyond Scaling: Architecting Adaptive and Collaborative Multimodal Intelligence

Relatore:  Loris Bazzani - Università degli Studi di Verona
  martedì 19 maggio 2026 alle ore 16.30 Aula D (CV1)
Abstract:
The prevailing AI paradigm is approaching a critical juncture where brute-force scaling of general-purpose models yields diminishing returns. As training costs escalate into the hundreds of millions, the industry remains challenged by frozen models that lack adaptability and struggle to deal with the long-tail complexities of real-world applications. This talk proposes a transition toward Adaptive and Collaborative Multimodal Intelligence: systems natively designed to be adaptable in a lightweight manner to environments where data is scarce and restricted and to interact with humans. We will explore three fundamental pillars necessary to bridge the gap between foundational research and industrial applications:
  1. Controllable Multimodal Data Generation & Privacy: active generation to deal with the long tail of rare events and privacy-restricted domains.
  2. Multimodal Adaptation & Specialization: leveraging adaptation techniques to customize models into domain-specific vertical experts.
  3. Human-AI Co-Design: integrating multimodal signals (language, gestures, and spatial clicks) as primary algorithmic constraints to facilitate collaboration between human and AI.
The presentation will broadly review my research of the past few years across academia and industry and defines future directions. I will zoom in on one of my recent works to demonstrate the value of the aforementioned pillars: "Interactive Episodic Memory with User Feedback" (CVPR 2026), which illustrates how integrating interactive memory allows models to better collaborate with humans.
 
Bio:
Loris Bazzani is an AI Research Leader with over 15 years of experience, spanning classical computer vision and machine learning to today’s foundation and multimodal generative models. He is currently an adjunct professor at the University at Verona. In his previous role as Principal Scientist at Amazon (where he spent almost a decade), he led core research and product efforts across Prime Video, Alexa, and shopping, co-developing architectures for video understanding, vision-language representation, Large Multimodal Models, and diffusion models. His work powered features such as live sports highlights, virtual try-on, interactive product recommendations, and shopping assistants, reaching millions of users and delivering significant business impact. Loris obtained his Ph.D. in Computer Science from the University of Verona (Italy) in 2012, supervised by Prof. Vittorio Murino and Prof. Marco Cristani. He held postdoctoral positions at Dartmouth College with Prof. Lorenzo Torresani, and at the Italian Institute of Technology with Prof. Vittorio Murino. His research has been published in top-tier venues including CVPR, ICCV, ECCV, and ICML, with 50+ publications and patents: https://lorisbaz.github.io/

Referente
Alessandro Farinelli

Referente esterno
Data pubblicazione
23 aprile 2026

Offerta formativa

Condividi