huffman coding
Concept
Huffman coding
- lossless compression
- high frequency symbols are given the shortest length code
- prefix code, symbols can have different length codes
- need to provide dictionary (overhead)
- used in:
- PNG
- JPEG (end)
Algorithm
- sort the symbols by frequency
- join the two smallest groups together, add the total frequency
- a group is either:
- a symbol
- a pair of groups
- a group is either:
- repeat step 2 until there is only one group encompassing all the symbols
- form the huffman tree
- form the dictionary
abaabc
- frequency table
- huffman tree
huffman trees defer from prefix trees since only the leaf nodes
- dictionary
the code is a prefix code, so no prefix of any code is another code that encodes a symbol
abaabc -> 0 10 0 0 10 11
6×8 bits 9 bits + dict
Application
Encoding text (short)
this is an example for huffman encoding
| space | 6 |
|---|---|
| N | 4 |
| A | 3 |
| E | 3 |
| F | 3 |
| I | 3 |
| H | 2 |
| M | 2 |
| O | 2 |
| S | 2 |
| C | 1 |
| D | 1 |
| G | 1 |
| L | 1 |
| P | 1 |
| R | 1 |
| T | 1 |
| U | 1 |
| X | 1 |
Encoding text (long)
data compression is the process of encoding information using fewer bits than the original representation compression can be either lossy or lossless
| space | 21 |
|---|---|
| S | 16 |
| E | 15 |
| O | 14 |
| I | 12 |
| N | 12 |
| R | 10 |
| T | 9 |
| A | 7 |
| C | 5 |
| H | 4 |
| L | 4 |
| P | 4 |
| F | 3 |
| G | 3 |
| M | 3 |
| B | 2 |
| D | 2 |
| U | 1 |
| W | 1 |
| Y | 1 |