In this course, we introduce basic computational methods for handling textual data. Textual data (strings, sequences) is ubiquitous in today's world: WWW
(webpages) and biological data (genomic and other biological sequences) are only two examples of large amounts of textual data which need to be handled on a daily basis. This type of data is being produced at an ever increasing rate, and one of the major computational challenges now is to develop data structures which allow both storing the data efficiently, and at the same time extracting information from it (e.g. search, pattern matching).
In this course, directed at a general audience of computer science PhD students, we give an introduction to this topic. The course starts with an introduction to basic information theoretic measures, compression, dictionary based compression. In the second part, we introduce three data structures for strings which have been milestones in the area of string storage. These are suffix trees, suffix arrays, and the Burrows-Wheeler transform (BWT). All of these datastructures have been or are being used in mainstream software for biological and other data. In this part, we will get an insight into of some of the majorchallenges in this research area. Finally, in the third part of the course, we give an introduction to an application of these concepts to genomic sequences.
Strada le Grazie 15
Partita IVA 01541040232
Codice Fiscale 93009870234
© 2017 | Università degli studi di Verona | Credits