|Teoria||4||II semestre||Nicola Bombieri|
|Laboratorio||2||II semestre||Nicola Bombieri|
This course aims at providing theoretical and practical knowledge about programming and analysis of advanced computational architectures, with emphasis on multiprocessor and GPU platforms. At the end of the course the student will have to demonstrate the ability to apply the knowledge necessary to: identify techniques for parallel programming, also in a research context, through analysis of application efficiency and by considering both functional and non-functional design constraints (correctness, performance, energy consumption); analyze performance and perform code profiling, by identifying critical zone and the corresponding optimizations by considering the architectural characteristics of the platform; demonstrate ability to compare parallel patterns and to select the best one by considering the use case; by defining the structure of the optimized code, demonstrate the ability to identify the proper architectural choices, by considering the target application and platform contexts. Finally, the student will have to demonstrate the ability to be able to continue the study autonomously in the field of the parallel programming languages and of the Software development for parallel embedded platforms.
Theory module (32 h):
-) Intro to parallelism and parallel architectures.
-) Programming parallel architectures.
-) Models of parallel programming.
-) Measurement and analysis of performance, Amdhal’s low and metrics for performance analysis.
-) Pipeline: basic and advanced concepts.
-) Instruction-level parallelism (ILP).
-) Advanced techniques of branch prediction, static scheduling, and speculation.
-) Memory hierarchy: basic and advanced concepts.
-) Advanced optimization techniques of cache performance.
-) Thread-level parallelism (TLP).
-) Cache coherency in shared-memory architectures, Snoopy protocols.
-) General purpose Graphic Processing Unit (GP-GPU).
-) Intro to non-functional contraints: power consumption and energy efficiency.
-) Deep learning at-the-edge: models and inference with architectural constraints (performance, energy efficiency, memory bandwidth, etc.).
-) Quantization and pruning of neural networks for inference on embedded architectures.
Lab module (24 h):
-) Parallel compilers for multicore architectures (OpenMP).
-) Paralle compilers for cluster architectures (MPI).
-) GP-GPU programming: CUDA.
-) Intelligent Video Analisys (Deep Learning + stream analisys) on embedded architectures.
To pass the exam, the student has to demonstrate:
- he/she has understood the principles related to the parallel architectures
- he/she is able to describe the concepts in a clear and exhaustive way without digressions
- he/she is able to apply the acquired knowledge to solve application scenarios described by means of exercises, questions and projects.
The exam consists of a written test, which contains questions with multiple answers, questions with open answers, and exercises related both the theoretical and lab modules. Alternatively, the student can elaborate a project assigned by the teacher.
|Teoria||John Hennessy, David Patterson||Computer Architecture - A Quantitative Approach (Edizione 6)||Morgan Kaufmann||2018||9780128119051|
|Teoria||David B. Kirk, Wen-mei W. Hwu||Programming Massively Parallel Processors - A Hands-on Approach (Edizione 3)||Morgan Kaufmann||2017||978-0-12-811986-0|