Huffman-Compressed Wavelet Trees for Large Alphabets

Abstract

In this web page we leave the slides of the presentation of the research work titled "Huffman-Compressed Wavelet Tree for Large Alphabets" presented at WTCA 2012 (Compression, Text and Algorithms), a satellite workshop of SPIRE 2012 (String Processing and Information Retrieval).

On it we show how we can reduce the size of a Huffman-shaped Wavelet Tree for sequences with large alphabets. For that purpose, we take two different ways:

By one hand, we explain how we can compress the mapping of a Huffman encoding (the mapping is a function that, given a symbol, returns its huffman code, and given a code returns its symbol) achieving better theoreticall and empirical results (we obtain mappings 7 times smaller than the current proposals). That is particular usefull for sequences with large alphabets (Natural Language, Oriental Languages, Floating-point sequences), where the number of different symbols is large and hence, the size of the mapping is not negligible.

On the other hand, we explain how we can represent a Huffman-shaped Wavelet Tree without using pointers. We present an experimental evaluation where we can see that we obtain representations that are from a 5% to a 20% smaller than several (common used) Wavelet Tree implementations.

Downloads

Slides of the work presented at the Workshop WTCA 2012.

About us

We are one researcher from the University of A Coruña (Spain) and one from the University of Chile (Chile).

Alberto Ordóñez Database Lab. University of A Coruña (UDC).
Gonzalo Navarro (phD). Computer Science Department. University of Chile (UChile).

This work was partially funded by the Spanish MICINN ref. AP2010-6038 (FPU Program) for Alberto Ordóñez, and Fondecyt Grant 1-110066, Chile for Gonzalo Navarro