Machine Learning Training: Research Challenges and Opportunities for the Distributed System Community

Speaker:  Giovanni Neglia - INRIA, Sophia Antipolis, Francia
  Tuesday, January 28, 2020 at 11:00 AM Sala Verde
In this talk, I will support the thesis that the Dystributed System community is not meant to simply apply machine learning (ML)  tools to solve specific problems, but can also contribute to design faster and more efficient distributed ML systems both for training and inference. I will first introduce machine learning training and show that computational speedups directly translate into better ML models. I will then explain why design choices for ML systems are inevitably entangled with optimization and statistical considerations. Finally, I will provide two examples from my recent research activity: dynamic (TCP-like) adaptation of the number of ML workers, and topology design.

Contact Person: D. Carra 

Programme Director

Publication date
January 8, 2020