The course offers an overview of the fundamental concepts of distributed computing systems that deal with very large datasets, together with the programming paradigms adopted by these systems. In particular, it will discuss the MapReduce paradigm, and its implementation in Spark. In addition, the system aspects of the distributed computation will be presented, including the data center architectures, and the solutions for storing such large datasets.
- Introduction to the course
- The MapReduce programming paradigm
- Apache Hadoop and Apache Spark
- Non-relational databases for Big Data
- Datacenter architectures
|Jimmy Lin, Chris Dyer||Data-Intensive Text Processing with MapReduce (Edizione 1)||Morgan & Claypool Publishers||2010||978-1608453429|
|Tom White||Hadoop: The Definitive Guide (Edizione 3)||Oreilly & Associates Inc||2012||978-1449311520|
The exam consists in carrying out a project in which the principles presented in class are applied.
Strada le Grazie 15
VAT number 01541040232
Italian Fiscal Code 93009870234
© 2022 | Verona University | Credits