LZW compression
Concept
Lempel-Ziv-Welch(LZW)
- lossless compression
- fixed length code, usually 12 bits (257-4095)
- doesn’t require a dictonary to be sent alongside
- dictionary is rebuilt built during decoding
- used in:
- GIF
Algorithm
- initialize the dictionary, implicitly starting with all ASCII characters
- start a substring buffer
w = "" - take a new character
k - check if
w + kis in the dictionary- if already inside:
w = w + k, positive lookahead
- else:
- add a new entry in the dictionary for
w + k - set
w = k
- add a new entry in the dictionary for
- if already inside:
- repeat from step 3 until the end of the string
next_code = 257
w = ""
for each character k in input:
if w + k is in dictionary:
w = w + k
else:
output code(w)
add (w + k) to dictionary with next_code
next_code++
w = k
output code(w)
abaabc
| dict | output | |||
|---|---|---|---|---|
| abaabc ^* | 257 | ’ab’ | 97 | ’a’ |
| abaabc ^* | 258 | ’ba’ | 98 | ’b’ |
| abaabc ^* | 259 | ’aa’ | 97 | ’a’ |
| abaabc ^^* | 260 | ’abc’ | 257 | ’ab’ |
| abaabc ^ | 99 | ’c’ |
abaabc -> 97 98 97 257 99
6×8 bits 5×12 bits
Application
Encoding
ABBAABBAABBAABBAABBA
| dict | output | |||
|---|---|---|---|---|
| ABBAABBAABBAABBAABBA ^* | 257 | ’AB’ | 65 | ’A’ |
| ABBAABBAABBAABBAABBA ^* | 258 | ’BB’ | 66 | ’B’ |
| ABBAABBAABBAABBAABBA ^* | 259 | ’BA’ | 66 | ’B’ |
| ABBAABBAABBAABBAABBA ^* | 260 | ’AA’ | 65 | ’A’ |
| ABBAABBAABBAABBAABBA ^^* | 261 | ’ABB’ | 257 | ’AB’ |
| ABBAABBAABBAABBAABBA ^^* | 262 | ’BAA’ | 259 | ’BA’ |
| ABBAABBAABBAABBAABBA ^^^* | 263 | ’ABBA’ | 261 | ’ABB’ |
| ABBAABBAABBAABBAABBA ^^* | 264 | ’AAB’ | 260 | ’AA’ |
| ABBAABBAABBAABBAABBA ^^* | 265 | ’BBA’ | 258 | ’BB’ |
| ABBAABBAABBAABBAABBA ^^^* | 266 | ’AABB’ | 264 | ’AAB’ |
| ABBAABBAABBAABBAABBA ^^ | 259 | ’BA’ |