Programming laboratory for bioinformatics (2020/2021)

Course code
Rosalba Giugno
Academic sector
Language of instruction
Teaching is organised as follows:
Activity Credits Period Academic staff Timetable
Teoria 6 II semestre, I semestre Rosalba Giugno

Go to lesson schedule

Laboratorio 6 II semestre, I semestre Rosalba Giugno

Go to lesson schedule

Learning outcomes

Knowledge and understanding The course aims to provide students with the knowledge and understanding of the paradigms and advanced programming tools for the management of biomedical / bioinformatic data and information. Applying knowledge and understanding The student will therefore be able to a) apply the paradigms and advanced programming tools for the analysis of genomic, transcriptomics and proteomics data; b) apply the code performance analysis and identify critical issues and their optimization. Making judgements Ability to independently propose effective and efficient solutions for the biomedical and bioinformatics application domain; ability to identify critical issues for the treatment of complex bioinformatics problems. Communication The student will also be able to interact with various interlocutors in a multidisciplinary biomedical and bioinformatics context, to interact with colleagues in the performance of group work, and to interact with the interlocutors in the working or research environment. Lifelong learning skills Ability to understand scientific literature in the process of interpreting the results or proposed solution, and to carry out individual and group in-depth studies aimed at tackling problems from the research and business world.


R Programming
Overview and History of R
Workspace and Files
Objects and Data Structures
Missing Values
Sequence of Numbers
Split-Apply-Combine Funtions
Reading Tabular Data
Control Structures
I/O operations
Base Graphics
Advanced Graphics

Bash- Scripting language
Overview of scripting language
Indexed arrays
Associative arrays
Conditional statements and operators
Comparison operators
I/O from files

R for Bioinformatics
Overview of BioConductor
Basic BioConductor Data Structures: IRanges and GenomicRanges
Classes and functions for representing biological strings: Biostrings
Classes and functions for representing genomes: BSgenome, GenomicRanges,
Annotation functions and overview of annotation web tools

RNA-SEQ Data Analysis using R/Python and web tools
Introduction to NGS technologies and experimental design
Data Pre-processing, from Fastq to BAM
Indexing Reference Genome
Mapping reads to a reference genome
Sorting and indexing alignment
Map quality control
Variant Discovery and Call set Refinement
Differential Analysis
Limma, Glimma, EdgeR
Practice on coding RNA and ncRNA detection and analysis

Applied Statistics for High-Throughput Data Mining
Introduction to variables and distribution
Linear modeling
Linear and generalized linear modeling
Model matrix and model formulae
Analysis of categorical variables, exploratory data analysis, multiple testing
Unsupervised analysis
Distance in high dimensions
Principal components analysis and multidimensional scaling
Unsupervised clustering
Partition Methods
Hierchical Methods
Density based methods
Batch effects

Advanced Analyses of biological data in R: methods for graphs and networks.
Networks in igraph
Create networks
Edge, vertex, and network attributes
Specific graphs and graph models
Reading network data from files
Turning networks into igraph objects
Plotting networks with igraph
Network and node descriptives
Distances and paths
Subgroups and communities
Assortativity and Homophily
Reconstruction and analysis of co-regulatory and co-espressed networks

The course includes special seminars in advanced topics such as Computational methods for the analysis of single cell data, graph mining, and multilayer networks. Topics are defined each year in base of the current trends in medical bioinformatics research. Students will have the possibility to use software related to the chosen topics and analyze real cases.

Assessment methods and criteria

The exam consists of a written part (A) and the development of a project (B). (A) consists in developing during the test day a R program for solving a given problem using genomic, transcriptomic or proteomic data. (B) is the development of a project agreed upon with the teacher after request by email and appointment for the elaboration of the specifications (the project is valid throughout the academic year). The projects have different levels of difficulty. Every difficulty corresponds to a maximum evaluation value.

Voting for parts A and B is expressed in thirty.

The final vote is calculated as min (31, ((A + B) / 2) + C).
C is expressed in the interval [-4, + 4] and reflects the maturation and scientific autonomy acquired during the development of the tests and the project, in the exposure and in the interpretation of the scientific literature and the scientific context of the project.

Reference books
Activity Author Title Publisher Year ISBN Note
Teoria Roger D. Peng Exploratory Data Analysis with R 2016
Teoria Michael I. Love, Simon Anders, Vladislav Kim, Wolfgang Huber RNA-Seq workflow: gene-level exploratory analysis and differential expression 2015
Laboratorio Roger D. Peng Exploratory Data Analysis with R 2016