Decompressing Massive Datasets

Speaker:  Simon J. Puglisi - University of Helsinki
  Tuesday, November 27, 2018 at 4:30 PM Sala verde
Simple and fast decoding is one of the main advantages of LZ-type text encoding used in many popular file compressors and software systems. However, the standard LZ decoding algorithm - which has not changed for 40 years - makes random accesses to the text and cannot be trivially modified to deal with files whose decompressed size exceeds that of main memory. This talk explores two algorithmic approaches to scaling LZ decoding to massive datasets. One of these approaches constitutes the first external memory algorithms for LZ decoding. We show the I/O complexity of these algorithms is optimal and demonstrate through experiments that they are very fast in practice. The second approach is a suite of algorithms that use working space proportional to the size of the compressed data and that streams their output - avoiding access to it during decoding.

Contact person: Zsuzsanna Lipták

Programme Director

External reference
Publication date
October 18, 2018