Compressing multisets using tries
V. Gripon, M. Rabbat, V. Skachek and W. J. Gross, "Compressing multisets using tries," in Proceedings of Information Theory Workshop, Lausanne, Switzerland, pp. 647651, September 2012.
We consider the problem of efficient and lossless representation of a multiset of m words drawn with repetition from a set of size 2^n . One expects that encoding the (unordered) multiset should lead to significant savings in rate as compared to encoding an (ordered) sequence with the same words, since information about the order of words in the sequence corresponds to a permutation. We propose and analyze a practical multiset encoder/decoder based on the trie data structure. The act of encoding requires O(m(n + log m)) operations, and decoding requires O(mn) operations. Of particular interest is the case where cardinality of the multiset scales as m = 2^n/c for some c > 1, as n → ∞. Under this scaling, and when the words in the multiset are drawn independently and uniformly, we show that the proposed encoding leads to an arbitrary improvement in rate over encoding an ordered sequence with the same words. Moreover, the expected length of the proposed codes in this setting is asymptotically within a constant factor of 5/3 of the lower bound.
Download manuscript.
Download presentation support.
Bibtex@inproceedings{GriRabSkaGro20129,
author = {Vincent Gripon and Michael Rabbat and
Vitaly Skachek and Warren J. Gross},
title = {Compressing multisets using tries},
booktitle = {Proceedings of Information Theory
Workshop},
year = {2012},
address = {Lausanne, Switzerland},
month = {September},
pages = {647651},
}


You are the 1575004th visitor
