The course offers an overview of the fundamental concepts of distributed computing systems that deal with very large datasets, together with the programming paradigms adopted by these systems. In particular, it will discuss the MapReduce paradigm, and its implementation in Spark. In addition, the system aspects of the distributed computation will be presented, including the data center architectures, and the solutions for storing such large datasets.
- Introduction to the course
- The MapReduce programming paradigm
- Apache Hadoop and Apache Spark
- Non-relational databases for Big Data
- Datacenter architectures
The exam consists in carrying out a project in which the principles presented in class are applied.
CSS e script siti DOL