PhD in Computer Science

PhD course "Computational methods for handling textual data"

29 Feb - 23 March 2016

Academic staff
Ferdinando Cicalese , Giuditta Franco , Zsuzsanna Liptak

Series to which this belongs

29° ciclo
30° ciclo
31° Ciclo



In this course, we introduce basic computational methods for handling textual data. Textual data (strings, sequences) is ubiquitous in today's world: WWW

(webpages) and biological data (genomic and other biological sequences) are only two examples of large amounts of textual data which need to be handled on a daily basis. This type of data is being produced at an ever increasing rate, and one of the major computational challenges now is to develop data structures which allow both storing the data efficiently, and at the same time extracting information from it (e.g. search, pattern matching).

In this course, directed at a general audience of computer science PhD students, we give an introduction to this topic. The course starts with an introduction to basic information theoretic measures, compression, dictionary based compression. In the second part, we introduce three data structures for strings which have been milestones in the area of string storage. These are suffix trees, suffix arrays, and the Burrows-Wheeler transform (BWT). All of these datastructures have been or are being used in mainstream software for biological and other data. In this part, we will get an insight into of some of the majorchallenges in this research area. Finally, in the third part of the course, we give an introduction to an application of these concepts to genomic sequences.


  • pdf   Flyer   (pdf, it, 139 KB, 15/03/16)