Information recognition and retrieval for bioinformatics (2016/2017)

Course code
Manuele Bicego
Academic sector
Language of instruction
Teaching is organised as follows:
Activity Credits Period Academic staff Timetable
Teoria 9 I sem. Manuele Bicego, Rosalba Giugno
Laboratorio 3 I sem. Pietro Lovato

Lesson timetable

Learning outcomes

The course is aimed at providing the theoretical and applied basis of Pattern Recognition, a class of automatic methodologies used to recognize and recover information from biological data. In particular, during the course the main aspects of this area will be presented and discussed: representation, classification, clustering and validation. The focus is more on the description of the employed methodologies rather than on the details of application programs (already seen in other courses)

At the end of the course, the students will be able to analyse a biological problem from a Pattern Recognition perspective; the will also have the skills needed to invent, develop and implement the different components of a Pattern Recognition System.


The course generally requires standard skills obtained from other courses of the first two years, with particular emphasis on basic notions of probability, statistics, and mathematical analysis.

The course is divided in three parts:
Part 1. The first part is devoted to the description and the analysis of the different methodologies for representation, classification and clustering of biological data

Part 2. The second part, more application-oriented, is devoted to the critical analysis of some relevant bioinformatics problems which are typically solved with classification or clustering approaches (e.g. gene expression data analysis, medical image segmentation, protein remote homology detection)

Part 3. The third part (in lab) is devoted to the implementation, using the MATLAB language, of some of the algorithms analysed in the first two parts.

Detailed Program

Theory (72 h):
- Introduction to Pattern Recognition
- Data Representation
- Bayes decision theory
- Generative and discriminative classifiers
- Validation
- Neural Networks
- Hidden Markov Models
- Clustering methods
- Clustering validation
- Applications

Lab (36 h):
- Introduction to matlab
- Data representation and standardization
- Principal Component Analysis
- Gaussians and Gaussian classifiers
- Hidden Markov Models

Reference books
R. Duda, P. Hart, D. Stork Pattern Classification. Wiley, 2001
P. Baldi, S. Brunak, Bioinformatics, The Machine Learning Approach. MIT Press, 2001
A.K. Jain and R.C. Dubes, Algorithms for Clustering Data, Prentice-Hall, 1988

Assessment methods and criteria

The exam is aimed at the verification of the following skills:
- capability of clearly and concisely describe the different components of a Pattern Recognition System
- capability of analize, understand and describe a Pattern Recognition system (or a given part of it) relative to a biological problem

The exam consists of two parts
i) a written exam containing questions on topics presented during the course (15 points available). The written part is passed is the grade is greater or equal to 8.
ii) an oral presentation of a scientific paper published in relevant bioinformatics journals during 2015. The paper is chosen by the candidate and approved by the instructor (15 points available).

The two parts of the exam can be passed separately: the final grade is the sum of the two grades.
The total exam is passed if the final grade is greater or equal to 18. Each evaluation is maintained valid for the whole academic year.

Teaching aids
Title Format (Language, Size, Publication date)
10. Clustering Validazione  pdfpdf (it, 316 KB, 11/11/16)
11. Hidden Markov Models  pdfpdf (it, 1032 KB, 21/11/16)
12. Reti Neurali  pdfpdf (it, 500 KB, 21/11/16)
13. Applicazioni - parte 1  pdfpdf (it, 6157 KB, 19/12/16)
14. Applicazioni - parte 2  pdfpdf (it, 6974 KB, 19/12/16)
7. Introduzione al Clustering  pdfpdf (it, 423 KB, 11/11/16)
8. Clustering - similarità  pdfpdf (it, 254 KB, 11/11/16)
9. Metodologie di clustering  pdfpdf (it, 798 KB, 11/11/16)
Istruzioni per il seminario  pdfpdf (it, 56 KB, 07/11/16)
SeminariAssegnati  pdfpdf (it, 42 KB, 25/09/17)
1. Introduzione  pdfpdf (it, 5094 KB, 03/10/16)
2. Rappresentazione  pdfpdf (it, 10178 KB, 03/10/16)
3. Teoria della decisione di Bayes  pdfpdf (it, 546 KB, 11/10/16)
4. Classificatori generativi  pdfpdf (it, 2444 KB, 11/10/16)
5. Classificatori discriminativi  pdfpdf (it, 1148 KB, 11/10/16)
6. Validazione dei classificatori  pdfpdf (it, 324 KB, 11/10/16)
Lab 01 - Intro Matlab  zipzip (it, 2160 KB, 03/10/16)
Lab 01 - Soluzioni  zipzip (it, 1 KB, 17/10/16)
Lab 02 - Intro Matlab 2  zipzip (it, 1061 KB, 10/10/16)
Lab 02 - Soluzioni  zipzip (it, 3 KB, 17/10/16)
Lab 03 - Soluzioni  zipzip (it, 2 KB, 24/10/16)
Lab 03 - Standardizzazione, PCA  zipzip (it, 325 KB, 17/10/16)
Lab 04 - Gaussiane  zipzip (it, 190 KB, 24/10/16)
Lab 04 - Soluzioni  zipzip (it, 4 KB, 07/11/16)
Lab 05 - Parzen Windows  zipzip (it, 256 KB, 07/11/16)
Lab 05 - Soluzioni  zipzip (it, 5 KB, 14/11/16)
Lab 06 - KNN  zipzip (it, 279 KB, 14/11/16)
Lab 06 - Soluzioni  zipzip (it, 23 KB, 21/11/16)
Lab 07 - PRTools 1  zipzip (it, 880 KB, 21/11/16)
Lab 07 - Soluzioni  zipzip (it, 0 KB, 28/11/16)
Lab 08 - PRTools 2  pdfpdf (it, 130 KB, 28/11/16)
Lab 08 - Soluzioni  zipzip (it, 0 KB, 12/12/16)
Lab 09 - Kmeans  zipzip (it, 306 KB, 12/12/16)
Lab 09 - Soluzioni  zipzip (it, 1 KB, 19/12/16)
Lab 10 - HMM  zipzip (it, 703 KB, 19/12/16)
Lab 10 - Soluzioni  zipzip (it, 277 KB, 09/01/17)
Lab 11 - Ripasso  zipzip (it, 268 KB, 09/01/17)