Data-intensive computing systems (2015/2016)

Course code
4S001412
Name of lecturer
Damiano Carra
Coordinator
Damiano Carra
Number of ECTS credits allocated
6
Academic sector
INF/01 - INFORMATICS
Language of instruction
Italian
Period
I semestre dal Oct 1, 2015 al Jan 29, 2016.

Lesson timetable

I semestre
Day Time Type Place Note
Thursday 2:30 PM - 4:30 PM lesson Lecture Hall A  
Friday 3:30 PM - 4:30 PM lesson Laboratory Alfa  
Friday 4:30 PM - 6:30 PM laboratorio Laboratory Alfa  

Learning outcomes

This course provides a broad introduction to the fundamentals in large-scale parallel computing systems that deals with very large data sets. The course topics cover programming models (MapReduce, Pregel), algorithmic design (text processing, inverted indexing, graph analysis), and system architecture (datacenter topologies, communication, failure management).

Syllabus

- Programming frameworks -
Distributed filesystems (HFS), NoSQL systems (HBase, Cassandra), data and graph processing (MapReduce, Pregel), SQL-like systems (Pig, Hive);

- Algorithms -
Design of algorithms for text processing, inverted indexing (PageRank), and graph analysis.

- Datacenter architectures -
Topologies (VL2, PortLand, c-Through), communication protocols (spanning tree, ECMP, OpenFlow), failure management.

Reference books
Author Title Publisher Year ISBN Note
Jimmy Lin, Chris Dyer Data-Intensive Text Processing with MapReduce (Edizione 1) Morgan & Claypool Publishers 2010 978-1608453429
Tom White Hadoop: The Definitive Guide (Edizione 3) Oreilly & Associates Inc 2012 978-1449311520

Assessment methods and criteria

Examination consists of a project and the corresponding documentation.

Teaching aids

Documents

STUDENT MODULE EVALUATION - 2015/2016