Big Data Systems and Analytics (2020/2021)

Course code
cod wi: DT000049
Name of lecturer
Damiano Carra
Coordinator
Damiano Carra
Number of ECTS credits allocated
5
Academic sector
ING-INF/05 - INFORMATION PROCESSING SYSTEMS
Language of instruction
Italian
Location
VERONA
Period
A.A. 20/21 dottorato dal Oct 1, 2020 al Sep 30, 2021.

Lesson timetable

Go to lesson schedule

Learning outcomes

The course offers an overview of the fundamental concepts of distributed computing systems that deal with very large datasets, together with the programming paradigms adopted by these systems. In particular, it will discuss the MapReduce paradigm, and its implementation in Spark. In addition, the system aspects of the distributed computation will be presented, including the data center architectures, and the solutions for storing such large datasets.

Syllabus

- Introduction to the course
- The MapReduce programming paradigm
- Apache Hadoop and Apache Spark
- Non-relational databases for Big Data
- Datacenter architectures

Reference books
Author Title Publisher Year ISBN Note
Jimmy Lin, Chris Dyer Data-Intensive Text Processing with MapReduce (Edizione 1) Morgan & Claypool Publishers 2010 978-1608453429
Tom White Hadoop: The Definitive Guide (Edizione 3) Oreilly & Associates Inc 2012 978-1449311520

Assessment methods and criteria

The exam consists in carrying out a project in which the principles presented in class are applied.

Share