This course provides a broad introduction to the fundamentals in large-scale parallel computing systems that deals with very large data sets.
At the end of the course, the student will have to show to know and understand how data-intensive analysis systems work, including the evaluation of the benefits and the limitations of the different solutions.
* Programming frameworks:
-- Distributed filesystems (HDFS);
-- Data and graph processing (MapReduce, Pregel);
-- SQL-like systems (Pig, Hive);
-- NoSQL systems (HBase, Cassandra).
-- Design of algorithms for text processing;
-- Indexing algorithms (inverted indexing);
-- Graph analysis (PageRank).
* Datacenter architectures:
-- Datacenter organization;
-- Datacenter networking;
-- Failure management.
|Jimmy Lin, Chris Dyer||Data-Intensive Text Processing with MapReduce (Edizione 1)||Morgan & Claypool Publishers||2010||978-1608453429|
|Tom White||Hadoop: The Definitive Guide (Edizione 3)||Oreilly & Associates Inc||2012||978-1449311520|
Examination consists of a project and the corresponding documentation. The project aims at verifying the comprehension of course contents and the capability to apply these contents in the resolution of a problem. The project topic is agreed with the teacher and focus on specific case studies. The project includes the performance evaluation for different input sizes, and the evaluation of the implementation alternatives. After the evaluation of the project documentation, the student may give an oral exam where the details of the project are discussed.
Data from AA 2017/2018 are not available yet