The course aims to provide the programming and interpretation tools necessary for the analysis of genomic, transcriptional and proteomic data from the latest generation technologies. For each subject, theoretical lessons are given followed by practices in laboratory.
At the completion of the course the students will be able to program, according to the data to be analyzed and the biomedical question to solve, the appropriate analysis pipeline. They will also be able to interpret the obtained results.
Programming in R. Introduction. Data Structures: Vectors, Matrices, Lines, Data Frame. Data Frame. Functions. In / Out. Visualization, the grammar of graphics and ggplot2.
Statistics: median, MAD, rank test, Spearman, robust linear mode, multiple testing, linear models,
Program with Bioconductor. Structure, principles and function. Sequence alignment and aligners, Experimental design, batch effects and confounding, RNA-Seq data analysis and differential expression, Methylation analysis, CNV analysis, Microarray analysis. Annotation resources, Gene set enrichment analysis.
Introduction and Basics of Programming in Python and Bash.
Advanced analysis algorithms: Clustering and classification, resampling: cross-validation, bootstrap, and permutation tests, biological network analysis.
Didactic material (Mainly based on continuously updated scientific articles and online programming guides) is available in the course e-learning platform of the University.
|Rafael A Irizarry and Michael I Love||Data Analysis for the Life Sciences||https://leanpub.com/dataanalysisforthelifesciences/||2015|
The exam consists of a written part (A) and the development of a project (B). A consists in developing a R program for solving a given problem using genomic, transcriptomic or proteomic data. B is the development of a project agreed upon with the teacher after request by email and appointment for the elaboration of the specifications (the project is valid throughout the academic year). The projects have different levels of difficulty. Every difficulty corresponds to a maximum evaluation value. Students will hold an interview to comment the A and B parts.
Concerning point A, attendants at the course have the right to participate in two intermediate trials scheduled during the year. The tests consist of the development of an R program for biomedical data analysis. The two tests will have a cumulative vote expressed in thirty and it will be communicated to the students at the end of the course.
Concerning point B, attendees at the course will be able to expose to the class their project, the research context in which the project is located, and the state of progress of the course.
Voting for parts A and B is expressed in thirty.
The final vote is calculated as min (31, ((A + B) / 2) + C).
C is expressed in the interval [-4, + 4] and reflects the maturation and scientific autonomy acquired during the development of the tests and the project, in the exposure and in the interpretation of the scientific literature and the scientific context of the project.