Machine Learning Training: Research Challenges and Opportunities for the Distributed System Community

Relatore:  Giovanni Neglia - INRIA, Sophia Antipolis, Francia
  martedì 28 gennaio 2020 alle ore 11.00 Sala Verde
In this talk, I will support the thesis that the Dystributed System community is not meant to simply apply machine learning (ML)  tools to solve specific problems, but can also contribute to design faster and more efficient distributed ML systems both for training and inference. I will first introduce machine learning training and show that computational speedups directly translate into better ML models. I will then explain why design choices for ML systems are inevitably entangled with optimization and statistical considerations. Finally, I will provide two examples from my recent research activity: dynamic (TCP-like) adaptation of the number of ML workers, and topology design.

8 gennaio 2020

